前言
本次只是为了达到打通DS 从master 端到worker端观察一个工作流是如何被解析成一个 任务集,这些任务是再是怎么发送到了worker端,通过worker提交任务到集群上。
不看细节只看主流程。
一、master中主要的类图
哈,我就把我在读代码的过程中认为是主要的类图进行画了下来,标上鲜艳的色彩的那个类一定是这些之中重要的类啦。
1.MasterSchedulerService
MasterSchedulerService继承了Thread类,核心的方法就是run方法,主要是间隔地去抢ZK的分布式锁,当获取到锁后就从command表里拉去一条信息,然后根据该信息生成工作流对象,接着将工作流实例对象注入了实现runnable接口的MasterExecThread类,并给ThreadPoolExecutor分配一个线程用于执行。这就是MasterSchedulerService主要干的活。
run方法的部分代码):
@Override
public void run() {
logger.info("master scheduler started");
while (Stopper.isRunning()){
InterProcessMutex mutex = null;
try {
boolean runCheckFlag = OSUtils.checkResource(masterConfig.getMasterMaxCpuloadAvg(), masterConfig.getMasterReservedMemory());
if(!runCheckFlag) {
Thread.sleep(Constants.SLEEP_TIME_MILLIS);
continue;
}
if (zkMasterClient.getZkClient().getState() == CuratorFrameworkState.STARTED) {
mutex = zkMasterClient.blockAcquireMutex();
int activeCount = masterExecService.getActiveCount();
// make sure to scan and delete command table in one transaction
// Command command = processService.findOneCommand();
Command command = processService.findOneCanExecuteCommand();
if (command != null) {
logger.info("find one command: id: {}, type: {}", command.getId(),command.getCommandType());
......省略.......
if (processInstance != null) {
logger.info("start master exec thread , split DAG ...");
masterExecService.execute(new MasterExecThread(processInstance, processService, nettyRemotingClient));
}
2.MasterExecThread
这个类的所有要干的活也是写在了run方法下:
核心点就是处理当前被分配到工作流实例。
代码如下:
@Override
public void run() {
// process instance is null
if (processInstance == null) {
logger.info("process instance is not exists");
return;
}
// check to see if it's done
if (processInstance.getState().typeIsFinished()) {
logger.info("process instance is done : {}", processInstance.getId());
return;
}
try {
if (processInstance.isComplementData() && Flag.NO == processInstance.getIsSubProcess()) {
// sub process complement data
executeComplementProcess();
} else {
// execute flow
executeProcess();
}
} catch (Exception e) {
logger.error("master exec thread exception", e);
logger.error("process execute failed, process id:{}", processInstance.getId());
processInstance.setState(ExecutionStatus.FAILURE);
processInstance.setEndTime(new Date());
processService.updateProcessInstance(processInstance);
} finally {
taskExecService.shutdown();
// post handle
postHandle();
sleepTaskContextCacheManager.removeProcessExecThreads(processInstance.getId());
}
}
当先主要看的方法就两个:
(1)executeComplementProcess();
(2)executeProcess();
它的代码如下,就是构建工作流,运行工作流,以及结束工作流,
/**
* execute process
*
* @throws Exception exception
*/
private void executeProcess() throws Exception {
prepareProcess();
runProcess();
endProcess();
}
/**
* prepare process parameter
* @throws Exception exception
*/
private void prepareProcess() throws Exception {
// gen process dag
buildFlowDag();
// init task queue
initTaskQueue();
logger.info("prepare process :{} end", processInstance.getId());
/**
* submit and watch the tasks, until the work flow stop
*/
private void runProcess(){
// submit start node
submitPostNode(null);
boolean sendTimeWarning = false;
while(!processInstance.isProcessInstanceStop() && Stopper.isRunning()){
// send warning email if process time out.
if(!sendTimeWarning && checkProcessTimeOut(processInstance) ){
alertManager.sendProcessTimeoutAlert(processInstance,
processService.findProcessDefineById(processInstance.getProcessDefinitionId()));
sendTimeWarning = true;
}
for(Map.Entry<MasterBaseTaskExecThread,Future<Boolean>> entry: activeTaskNode.entrySet()) {
Future<Boolean> future = entry.getValue();
TaskInstance task = entry.getKey().getTaskInstance();
if(!future.isDone()){
continue;
}
// node monitor thread complete
task = this.processService.findTaskInstanceById(task.getId());
if(task == null){
this.taskFailedSubmit = true;
activeTaskNode.remove(entry.getKey());
continue;
}
// node monitor thread complete
if(task.getState().typeIsFinished()){
activeTaskNode.remove(entry.getKey());
}
logger.info("task :{}, id:{} complete, state is {} ",
task.getName(), task.getId(), task.getState());
// node success , post node submit
if(task.getState() == ExecutionStatus.SUCCESS){
processInstance.setVarPool(task.getVarPool());
processService.updateProcessInstance(processInstance);
completeTaskList.put(task.getName(), task);
submitPostNode(task.getName());
continue;
}
// node fails, retry first, and then execute the failure process
if(task.getState().typeIsFailure()){
if(task.getState() == ExecutionStatus.NEED_FAULT_TOLERANCE){
this.recoverToleranceFaultTaskList.add(task);
}
if(task.taskCanRetry()){
addTaskToStandByList(task);
}else{
completeTaskList.put(task.getName(), task);
if( task.isConditionsTask()
|| DagHelper.haveConditionsAfterNode(task.getName(), dag)) {
submitPostNode(task.getName());
}else{
errorTaskList.put(task.getName(), task);
if(processInstance.getFailureStrategy() == FailureStrategy.END){
killTheOtherTasks();
}
}
}
continue;
}
// other status stop/pause
completeTaskList.put(task.getName(), task);
}
// send alert
if(CollectionUtils.isNotEmpty(this.recoverToleranceFaultTaskList)){
alertManager.sendAlertWorkerToleranceFault(processInstance, recoverToleranceFaultTaskList);
this.recoverToleranceFaultTaskList.clear();
}
// updateProcessInstance completed task status
// failure priority is higher than pause
// if a task fails, other suspended tasks need to be reset kill
if(errorTaskList.size() > 0){
for(Map.Entry<String, TaskInstance> entry: completeTaskList.entrySet()) {
TaskInstance completeTask = entry.getValue();
if(completeTask.getState()== ExecutionStatus.PAUSE){
completeTask.setState(ExecutionStatus.KILL);
completeTaskList.put(entry.getKey(), completeTask);
processService.updateTaskInstance(completeTask);
}
}
}
if(canSubmitTaskToQueue()){
submitStandByTask();
}
try {
Thread.sleep(Constants.SLEEP_TIME_MILLIS);
} catch (InterruptedException e) {
logger.error(e.getMessage(),e);
Thread.currentThread().interrupt();
}
updateProcessInstanceState();
}
不水啦,反正这些代码的方法都是見名知意,上面的代码就是根据工作流的实例构建DAG,然后寻找当前这个DAG下可以提交的那些任务,然后整个runprocess方法中, submitStandByTask()方法就是进入下一关的关口。