分布式系统之Chronos调度

Chronos 是一个具备容错特性的作业调度器,可处理依赖性和基于 ISO8601 的调度。Chronos 是由 Airbnb 公司推出的用来替代 cron 的开源产品。你可以用它来对作业进行编排,支持使用 Mesos 作为作业执行器,支持和 Hadoop 进行交互。可定义作业执行完成后的触发器。支持任意长度的依赖链。



 

Chronos is our replacement for cron. It is a distributed and fault-tolerant scheduler which runs on top of Mesos. It’s a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. 

Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures.

Chronos has a number of advantages over regular cron. It allows you to schedule your jobs using ISO8601 repeating interval notation, which enables more flexibility in job scheduling. Chronos also supports the definition of jobs triggered by the completion of other jobs, and it also supports arbitrarily long dependency chains.

Chronos: How does it work?

Chronos is a Mesos scheduler for running schedule and dependency based jobs. Scheduled jobs are configured with ISO8601-based schedules with repeating intervals. Typically, a job is scheduled to run indefinitely, such as once per day or per hour. Dependent jobs may have multiple parents, and will be triggered once all parents have been successfully invoked at least once since the last invocation of the dependent job.

扫描二维码关注公众号,回复: 295036 查看本文章



 

 

Internally, the Chronos scheduler main loop is quite simple. The pattern is as follows:

1)Chronos reads all job state from the state store (ZooKeeper)

2)Jobs are registered within the scheduler and loaded into the job graph for tracking dependencies.

3)Jobs are separated into a list of those which should be run at the current time (based on the clock of the host machine), and those which should not.

4)Jobs in the list of jobs to run are queued, and will be launched as soon as a sufficient offer becomes available.

5)Chronos will sleep until the next job is scheduled to run, and begin again from step 1.

Furthermore, a dependent job will be queued for execution once all parents have successfully completed at least once since the last time it ran. After the dependent job runs, the cycle resets.

This code lives within the mainLoop() method, and can be found here.

Additionally, Chronos has a number of advanced features to help you build whatever it is you may be trying to. It can:

1)Write job metrics to Cassandra for further analysis, validation, and party favours

2)Send notifications to various endpoints such as email, Slack, and others

3)Export metrics to graphite and elsewhere

Chronos cannot:

1)Magically solve all distributed computing problems for you

2)Guarantee precise scheduling

3)Guarantee clock synchronization

4)Guarantee that jobs actually run

猜你喜欢

转载自gaojingsong.iteye.com/blog/2352712