前言

本次只是为了达到打通DS 从master 端到worker端观察一个工作流是如何被解析成一个任务集，这些任务是再是怎么发送到了worker端，通过worker提交任务到集群上。
不看细节只看主流程。

一、master中主要的类图

哈，我就把我在读代码的过程中认为是主要的类图进行画了下来，标上鲜艳的色彩的那个类一定是这些之中重要的类啦。
在这里插入图片描述

1.MasterSchedulerService

MasterSchedulerService继承了Thread类，核心的方法就是run方法，主要是间隔地去抢ZK的分布式锁，当获取到锁后就从command表里拉去一条信息，然后根据该信息生成工作流对象，接着将工作流实例对象注入了实现runnable接口的MasterExecThread类，并给ThreadPoolExecutor分配一个线程用于执行。这就是MasterSchedulerService主要干的活。

run方法的部分代码）：

 @Override
    public void run() {
    
    
        logger.info("master scheduler started");
        while (Stopper.isRunning()){
    
    
            InterProcessMutex mutex = null;
            try {
    
    
                boolean runCheckFlag = OSUtils.checkResource(masterConfig.getMasterMaxCpuloadAvg(), masterConfig.getMasterReservedMemory());
                if(!runCheckFlag) {
    
    
                    Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                    continue;
                }
                if (zkMasterClient.getZkClient().getState() == CuratorFrameworkState.STARTED) {
    
    

                    mutex = zkMasterClient.blockAcquireMutex();

                    int activeCount = masterExecService.getActiveCount();
                    // make sure to scan and delete command  table in one transaction
//                    Command command = processService.findOneCommand();
                    Command command = processService.findOneCanExecuteCommand();
                    if (command != null) {
    
    
                        logger.info("find one command: id: {}, type: {}", command.getId(),command.getCommandType());
......省略.......
 					if (processInstance != null) {
    
    
                                logger.info("start master exec thread , split DAG ...");
                                masterExecService.execute(new MasterExecThread(processInstance, processService, nettyRemotingClient));
                            }

2.MasterExecThread

这个类的所有要干的活也是写在了run方法下：
核心点就是处理当前被分配到工作流实例。
代码如下：

@Override
    public void run() {
    
    

        // process instance is null
        if (processInstance == null) {
    
    
            logger.info("process instance is not exists");
            return;
        }
        // check to see if it's done
        if (processInstance.getState().typeIsFinished()) {
    
    
            logger.info("process instance is done : {}", processInstance.getId());
            return;
        }
        try {
    
    
            if (processInstance.isComplementData() && Flag.NO == processInstance.getIsSubProcess()) {
    
    
                // sub process complement data
                executeComplementProcess();
            } else {
    
    
                // execute flow
                executeProcess();
            }
        } catch (Exception e) {
    
    
            logger.error("master exec thread exception", e);
            logger.error("process execute failed, process id:{}", processInstance.getId());
            processInstance.setState(ExecutionStatus.FAILURE);
            processInstance.setEndTime(new Date());
            processService.updateProcessInstance(processInstance);
        } finally {
    
    
            taskExecService.shutdown();
            // post handle
            postHandle();
            sleepTaskContextCacheManager.removeProcessExecThreads(processInstance.getId());
        }
    }

当先主要看的方法就两个：

（1）executeComplementProcess();
（2）executeProcess();
它的代码如下，就是构建工作流，运行工作流，以及结束工作流,

/**
    * execute process
    *
    * @throws Exception exception
    */
   private void executeProcess() throws Exception {
    
    
       prepareProcess();
       runProcess();
       endProcess();
   }

/**
    * prepare process parameter
    * @throws Exception exception
    */
   private void prepareProcess() throws Exception {
    
    

       // gen process dag
       buildFlowDag();

       // init task queue
       initTaskQueue();
       logger.info("prepare process :{} end", processInstance.getId());
    /**
    * submit and watch the tasks, until the work flow stop
    */
   private void runProcess(){
    
    
       // submit start node
       submitPostNode(null);
       boolean sendTimeWarning = false;
       while(!processInstance.isProcessInstanceStop() && Stopper.isRunning()){
    
    

           // send warning email if process time out.
           if(!sendTimeWarning && checkProcessTimeOut(processInstance) ){
    
    
               alertManager.sendProcessTimeoutAlert(processInstance,
                       processService.findProcessDefineById(processInstance.getProcessDefinitionId()));
               sendTimeWarning = true;
           }
           for(Map.Entry<MasterBaseTaskExecThread,Future<Boolean>> entry: activeTaskNode.entrySet()) {
    
    
               Future<Boolean> future = entry.getValue();
               TaskInstance task  = entry.getKey().getTaskInstance();

               if(!future.isDone()){
    
    
                   continue;
               }

               // node monitor thread complete
               task = this.processService.findTaskInstanceById(task.getId());

               if(task == null){
    
    
                   this.taskFailedSubmit = true;
                   activeTaskNode.remove(entry.getKey());
                   continue;
               }

               // node monitor thread complete
               if(task.getState().typeIsFinished()){
    
    
                   activeTaskNode.remove(entry.getKey());
               }

               logger.info("task :{}, id:{} complete, state is {} ",
                       task.getName(), task.getId(), task.getState());
               // node success , post node submit
               if(task.getState() == ExecutionStatus.SUCCESS){
    
    
                   processInstance.setVarPool(task.getVarPool());
                   processService.updateProcessInstance(processInstance);
                   completeTaskList.put(task.getName(), task);
                   submitPostNode(task.getName());
                   continue;
               }
               // node fails, retry first, and then execute the failure process
               if(task.getState().typeIsFailure()){
    
    
                   if(task.getState() == ExecutionStatus.NEED_FAULT_TOLERANCE){
    
    
                       this.recoverToleranceFaultTaskList.add(task);
                   }
                   if(task.taskCanRetry()){
    
    
                       addTaskToStandByList(task);
                   }else{
    
    
                       completeTaskList.put(task.getName(), task);
                       if( task.isConditionsTask()
                           || DagHelper.haveConditionsAfterNode(task.getName(), dag)) {
    
    
                           submitPostNode(task.getName());
                       }else{
    
    
                           errorTaskList.put(task.getName(), task);
                           if(processInstance.getFailureStrategy() == FailureStrategy.END){
    
    
                               killTheOtherTasks();
                           }
                       }
                   }
                   continue;
               }
               // other status stop/pause
               completeTaskList.put(task.getName(), task);
           }
           // send alert
           if(CollectionUtils.isNotEmpty(this.recoverToleranceFaultTaskList)){
    
    
               alertManager.sendAlertWorkerToleranceFault(processInstance, recoverToleranceFaultTaskList);
               this.recoverToleranceFaultTaskList.clear();
           }
           // updateProcessInstance completed task status
           // failure priority is higher than pause
           // if a task fails, other suspended tasks need to be reset kill
           if(errorTaskList.size() > 0){
    
    
               for(Map.Entry<String, TaskInstance> entry: completeTaskList.entrySet()) {
    
    
                   TaskInstance completeTask = entry.getValue();
                   if(completeTask.getState()== ExecutionStatus.PAUSE){
    
    
                       completeTask.setState(ExecutionStatus.KILL);
                       completeTaskList.put(entry.getKey(), completeTask);
                       processService.updateTaskInstance(completeTask);
                   }
               }
           }
           if(canSubmitTaskToQueue()){
    
    
               submitStandByTask();
           }
           try {
    
    
               Thread.sleep(Constants.SLEEP_TIME_MILLIS);
           } catch (InterruptedException e) {
    
    
               logger.error(e.getMessage(),e);
               Thread.currentThread().interrupt();
           }
           updateProcessInstanceState();
       }

不水啦，反正这些代码的方法都是見名知意，上面的代码就是根据工作流的实例构建DAG,然后寻找当前这个DAG下可以提交的那些任务，然后整个runprocess方法中， submitStandByTask()方法就是进入下一关的关口。

总结

emmmmm,创建工作流，尽量少在工作流里创建子工作流。加油继续水下一篇。

dolphin scheduler（二）

前言

一、master中主要的类图

1.MasterSchedulerService

2.MasterExecThread

总结

猜你喜欢