Apache DolphinScheduler's multi-cluster unified construction and management practice in the communications industry

file

Background introduction

Why are we considering building a unified scheduling platform? The main reason is: Our company's big data center currently has seven big data clusters, which are distributed in different computer rooms, such as Inner Mongolia, Nanjing, Suzhou and Guangzhou. Moreover, the networks between these computer rooms are not interconnected. If each cluster deploys a scheduling system independently, there will be multiple sets of scheduling service management entrances, which is very inconvenient for operations and developers, both in terms of maintenance and use. Therefore, we decided to build a unified scheduling platform to centrally manage the scheduling tasks of multiple clusters, and also provide opportunities for our subsequent in-depth platform integration.

Build experience

Network communication: Previously, our DolphinScheduler was based on intranet communication in a single computer room. However, considering that our cluster is spread across multiple provinces, we need to modify it to support cross-computer room communication through the public network. Considering the impact of reducing network latency, nodes in the same computer room still want intranet communication between services. . In order to ensure data security, we have also configured TLS encryption for public network communications.

Permission management: Since we need to manage multiple clusters, we will encounter the problem of multiple cluster permissions. We optimize the workgroup function of DolphinScheduler to manage different cluster environments and isolate permissions for different cluster environments and tenants.

Task resource sharing: DolphinScheduler itself supports object storage. We decided to upload the task resources of all clusters to the same object storage bucket to achieve unified management and scheduling of resources.

service architecture

Our new architecture is based on version 3.1.4 of DolphinScheduler. In order to achieve mixed deployment of public network and intranet communication, we have made the following adjustments:

  • Service nodes in the same computer room communicate through the intranet.
  • Nodes between different computer rooms communicate through the public network.
  • The master node and zookeeper are deployed in the same computer room and communicate with other nodes.

file

In order to implement the above design, we modified the source code of DolphinScheduler so that it can identify service nodes based on hostname (Hostname), not just IP. Then, we map the internal IP and public IP by configuring the hosts file to achieve dynamic switching of internal and external IPs.

Challenges and implementation of multi-privilege cluster management

Faced with the actual situation that many projects are based on a single cluster architecture (single computer room), we implemented a unique cluster identification system, which was implemented by adding fields in the database table. To identify each cluster, we use character identifiers to clearly indicate cluster ownership in the project.

The introduction of cluster identification is based on the following points:

  • Deployment decision: We deploy different account nodes to different data centers.
  • Environment Acquisition: During user provisioning we need to determine how they obtain environment information. To do this, we centralize the environment configurations of different clusters and assign them to different groups.
  • Authorization strategy: When authorizing, we only need to authorize the corresponding cluster environment to the user. When configuring tasks, users only need to select the cluster environment we have authorized for them.

In the process of deploying nodes with different accounts to multiple computer rooms, a question worth discussing is: how to effectively obtain the operating environment when configuring users?

file

We configure the environment information of each cluster to the platform and further configure it to different groups. When performing authorization, the operation is simplified - only the authorized cluster environment is authorized to the corresponding user, and cluster isolation and task distribution are achieved through the cluster environment.file

Upgrade and transformation of scheduling logic

In actual use, we upgraded from DolphinScheduler 3.0 to 3.1.4 and experienced multiple version upgrades. After experiencing the challenges of cross-machine room construction and temporary service outages, we developed an automatic continuous scheduling system to solve the problem of manual data supplementation due to unplanned outages.

Optimization

Logic unification: We have unified the scheduling logic of DS. The previous logic will insert a record into the table when scheduling a task, and then delete it after the scheduling is completed. The current optimization solution directly pre-generates 50 records that need to be scheduled in the future.

Caching of resource files

We faced a challenge - in the process of scheduling tasks across computer rooms, resource files need to be downloaded from S3. Due to the limitation of the bandwidth of the computer room, this process becomes extremely slow. Therefore, we implemented a resource file caching mechanism. When resources are downloaded from S3, local caching and timestamp judgment are used to avoid unnecessary repeated downloads, and soft links are used to quickly guide the execution directory.

The necessity of cache optimization stems from the following points:

  • Cross-network scheduling: When we schedule tasks across networks, we need to download resources from S3.
  • Bandwidth limitations: Slow download speeds due to data center bandwidth limitations (Gigabit bandwidth, as opposed to the industry norm of 10 Gigabit).

Implementation details

  • Caching logic: Briefly, every resource downloaded from S3 will be cached locally. Determine whether a resource has been updated by checking its timestamp. Unupdated resources will be linked directly to local files.

Visual display of scheduling continuity

Through the display of specific charts, we explain in detail the effect of automatic continuation of scheduling and the related recovery fault tolerance mechanism. For example, one workflow job was executed every 20 seconds, and after nearly three minutes of DS being unavailable, we restored service. Scheduling can continue to execute unscheduled instances during outage, and avoids the need for the data development team to manually add numbers when publishing or service restarts.file

Future plan: Develop job scheduling analysis page

We noticed that the current Dolphin scheduler does not yet have a page for centralized analysis of multi-project jobs. We plan to develop a job scheduling analysis page to simplify job failure log analysis and job scheduling follow-up processing for multi-cluster projects.

file

This page will display related jobs based on the task dimension, allow viewing logs, rerun jobs, and provide certain filtering functions. This will help development and operation and maintenance teams locate and analyze problems more quickly, and efficiently handle operations such as job reshipping.

Through the above-mentioned series of strategies and improvements, we have achieved technical optimization and improvement in multiple aspects such as multi-cluster management, scheduling logic and resource caching. We will continue to conduct in-depth research and development, hoping to provide more convenience and support to the community.

This concludes my speech. Thank you very much for listening!

This article is published by Beluga Open Source Technology !

Alibaba Cloud suffered a serious failure, affecting all products (has been restored). The Russian operating system Aurora OS 5.0, a new UI, was unveiled on Tumblr. Many Internet companies urgently recruited Hongmeng programmers . .NET 8 is officially GA, the latest LTS version UNIX time About to enter the 1.7 billion era (already entered) Xiaomi officially announced that Xiaomi Vela is fully open source, and the underlying kernel is .NET 8 on NuttX Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released. Microsoft launches a new "Windows App"
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/dailidong/blog/10143747