Coordinator时Druid的中心协调模块,用于解耦各个模块之间的直接联系,负责Segment的管理与分发,控制历史节点上Segment的装载和删除,并保持Segment在各个历史节点上的负载均衡。
Coordinator采用定期运行任务的设计模式。它包含一些不同的任务。Coordinator并不是直接和历史节点发生调用关系,而是通过Zookeeper作为桥梁,将指令发送到Zookeeper上,然后历史节点获取Zookeeper上的指令来装载和卸载Segment。
Coordinator装载和卸载Segment的依据来自于一系列的规则,这些规则可以通过Druid的管理工具或者参数来配置。这些规则包括:
- 永久装载(LoadForever)
- 时间段装载(LoadByInterval)
- 最近时间段装载(LoadByPeriod)
Coordinator的卸载规则如下:
- 永久卸载(DropForever)
- 时间段卸载(DropByInterval)
- 最近时间段卸载(DropByPeriod)
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") @JsonSubTypes(value = { @JsonSubTypes.Type(name = "loadByPeriod", value = PeriodLoadRule.class), @JsonSubTypes.Type(name = "loadByInterval", value = IntervalLoadRule.class), @JsonSubTypes.Type(name = "loadForever", value = ForeverLoadRule.class), @JsonSubTypes.Type(name = "dropByPeriod", value = PeriodDropRule.class), @JsonSubTypes.Type(name = "dropByInterval", value = IntervalDropRule.class), @JsonSubTypes.Type(name = "dropForever", value = ForeverDropRule.class) })
Coordinator的代码入口类是:io.druid.server.coordinator.DruidCoordinator
DruidCoordinator引入了几个管理类用于获取Segment与集群的信息和管理能力。例如下图的MetadataSegmentManager和MetadataRuleManager这两个接口都是通过SQL查询方式从MySQL服务器获取规则和Segment信息。
Coordinator的启动是从start()开始的,首先通过ZK的LeaderLatch选取一个Leader。该Leader会定期运行一些任务:
@LifecycleStart public void start() { synchronized (lock) { if (started) { return; } started = true; createNewLeaderLatch(); try { leaderLatch.get().start(); } catch (Exception e) { throw Throwables.propagate(e); } } } private LeaderLatch createNewLeaderLatch() { final LeaderLatch newLeaderLatch = new LeaderLatch( curator, ZKPaths.makePath(zkPaths.getCoordinatorPath(), COORDINATOR_OWNER_NODE), self.getHostAndPort() ); newLeaderLatch.addListener( new LeaderLatchListener() { @Override public void isLeader() { DruidCoordinator.this.becomeLeader(); } @Override public void notLeader() { DruidCoordinator.this.stopBeingLeader(); } }, Execs.singleThreaded("CoordinatorLeader-%s") ); return leaderLatch.getAndSet(newLeaderLatch); }
在已经确定的Leader上,可以开始定期地执行任务了:
private void becomeLeader() { synchronized (lock) { if (!started) { return; } log.info("I am the leader of the coordinators, all must bow!"); log.info("Starting coordination in [%s]", config.getCoordinatorStartDelay()); try { leaderCounter++; leader = true; metadataSegmentManager.start(); metadataRuleManager.start(); serverInventoryView.start(); serviceAnnouncer.announce(self); final int startingLeaderCounter = leaderCounter; final List<Pair<? extends CoordinatorRunnable, Duration>> coordinatorRunnables = Lists.newArrayList(); coordinatorRunnables.add( Pair.of( new CoordinatorHistoricalManagerRunnable(startingLeaderCounter), config.getCoordinatorPeriod() ) ); if (indexingServiceClient != null) { coordinatorRunnables.add( Pair.of( new CoordinatorIndexingServiceRunnable( makeIndexingServiceHelpers(), startingLeaderCounter ), config.getCoordinatorIndexingPeriod() ) ); } for (final Pair<? extends CoordinatorRunnable, Duration> coordinatorRunnable : coordinatorRunnables) { ScheduledExecutors.scheduleWithFixedDelay( exec, config.getCoordinatorStartDelay(), coordinatorRunnable.rhs, new Callable<ScheduledExecutors.Signal>() { private final CoordinatorRunnable theRunnable = coordinatorRunnable.lhs; @Override public ScheduledExecutors.Signal call() { if (leader && startingLeaderCounter == leaderCounter) { theRunnable.run(); } if (leader && startingLeaderCounter == leaderCounter) { // (We might no longer be leader) return ScheduledExecutors.Signal.REPEAT; } else { return ScheduledExecutors.Signal.STOP; } } } ); } } catch (Exception e) { log.makeAlert(e, "Unable to become leader") .emit(); final LeaderLatch oldLatch = createNewLeaderLatch(); CloseQuietly.close(oldLatch); try { leaderLatch.get().start(); } catch (Exception e1) { // If an exception gets thrown out here, then the coordinator will zombie out 'cause it won't be looking for // the latch anymore. I don't believe it's actually possible for an Exception to throw out here, but // Curator likes to have "throws Exception" on methods so it might happen... log.makeAlert(e1, "I am a zombie") .emit(); } } } }
其中CoordinatorHistoricalManagerRunnable和CoordinatorIndexingServiceRunnable最为重要。
CoordinatorHistoricalManagerRunnable包括了多个具体任务的集合:
private class CoordinatorHistoricalManagerRunnable extends CoordinatorRunnable { public CoordinatorHistoricalManagerRunnable(final int startingLeaderCounter) { super( ImmutableList.of( new DruidCoordinatorSegmentInfoLoader(DruidCoordinator.this), new DruidCoordinatorHelper() { @Override public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params) { // Display info about all historical servers Iterable<ImmutableDruidServer> servers = FunctionalIterable .create(serverInventoryView.getInventory()) .filter( new Predicate<DruidServer>() { @Override public boolean apply( DruidServer input ) { return input.isAssignable(); } } ).transform( new Function<DruidServer, ImmutableDruidServer>() { @Override public ImmutableDruidServer apply(DruidServer input) { return input.toImmutableDruidServer(); } } ); if (log.isDebugEnabled()) { log.debug("Servers"); for (ImmutableDruidServer druidServer : servers) { log.debug(" %s", druidServer); log.debug(" -- DataSources"); for (ImmutableDruidDataSource druidDataSource : druidServer.getDataSources()) { log.debug(" %s", druidDataSource); } } } // Find all historical servers, group them by subType and sort by ascending usage final DruidCluster cluster = new DruidCluster(); for (ImmutableDruidServer server : servers) { if (!loadManagementPeons.containsKey(server.getName())) { String basePath = ZKPaths.makePath(zkPaths.getLoadQueuePath(), server.getName()); LoadQueuePeon loadQueuePeon = taskMaster.giveMePeon(basePath); log.info("Creating LoadQueuePeon for server[%s] at path[%s]", server.getName(), basePath); loadManagementPeons.put(server.getName(), loadQueuePeon); } cluster.add(new ServerHolder(server, loadManagementPeons.get(server.getName()))); } segmentReplicantLookup = SegmentReplicantLookup.make(cluster); // Stop peons for servers that aren't there anymore. final Set<String> disappeared = Sets.newHashSet(loadManagementPeons.keySet()); for (ImmutableDruidServer server : servers) { disappeared.remove(server.getName()); } for (String name : disappeared) { log.info("Removing listener for server[%s] which is no longer there.", name); LoadQueuePeon peon = loadManagementPeons.remove(name); peon.stop(); } return params.buildFromExisting() .withDruidCluster(cluster) .withDatabaseRuleManager(metadataRuleManager) .withLoadManagementPeons(loadManagementPeons) .withSegmentReplicantLookup(segmentReplicantLookup) .withBalancerReferenceTimestamp(DateTime.now()) .build(); } }, new DruidCoordinatorRuleRunner(DruidCoordinator.this), new DruidCoordinatorCleanupUnneeded(DruidCoordinator.this), new DruidCoordinatorCleanupOvershadowed(DruidCoordinator.this), new DruidCoordinatorBalancer(DruidCoordinator.this), new DruidCoordinatorLogger() ), startingLeaderCounter ); } }
其中:
- DruidCoordinatorSegmentInfoLoader:用于装载Segment信息,删除无效的Segment信息。
- DruidCoordinatorRuleRunner:装载规则信息,并且应用到所有的Segment。
- DruidCoordinatorCleanupUnneeded:移除一些无效的Segment,不在MataManager中的Segment。
- DruidCoordinatorBalancer:定时整理Segment分布的平衡性,移动部分Segment以平衡负载。
Balancer的想法就是尽量让那些容易被同一个查询覆盖的Segment分布在整个集群的不同历史节点上。最大利用集群的能力,以避免大量查询集中在集群中的某些机器上。
Druid的负载均衡算法在类CostBalancerStrategy中。