紧接着系列(三)。
前面的介绍,基本都比较浅显易懂,讲述了Yarn的Client提交新的Application给ResourceManager,后者返回唯一的ID。
本文索要讲的,是RM端如何把Application的ApplicationMaster给启动起来的。
力求通俗易懂,但是看起来没那么容易。
追溯下来,我们发现第二次提交Application的逻辑,是由YarnRunner来实现的,实现如下:
@Override public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts) throws IOException, InterruptedException { addHistoryToken(ts); // Construct necessary information to start the MR AM ApplicationSubmissionContext appContext = createApplicationSubmissionContext( conf, jobSubmitDir, ts); // Submit to ResourceManager try { ApplicationId applicationId = resMgrDelegate .submitApplication(appContext); ApplicationReport appMaster = resMgrDelegate .getApplicationReport(applicationId); String diagnostics = (appMaster == null ? "application report is null" : appMaster.getDiagnostics()); if (appMaster == null || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) { throw new IOException("Failed to run job : " + diagnostics); } return clientCache.getClient(jobId).getJobStatus(jobId); } catch (YarnException e) { throw new IOException(e); } }
接着,看下resMgrDelegate中的submitApplication方法,其实际上是由YarnClientImpl中的方法实现的:
@Override public ApplicationId submitApplication( ApplicationSubmissionContext appContext) throws YarnException, IOException { ApplicationId applicationId = appContext.getApplicationId(); if (applicationId == null) { throw new ApplicationIdNotProvidedException( "ApplicationId is not provided in ApplicationSubmissionContext"); } SubmitApplicationRequest request = Records .newRecord(SubmitApplicationRequest.class); request.setApplicationSubmissionContext(appContext); // Automatically add the timeline DT into the CLC // Only when the security and the timeline service are both enabled if (isSecurityEnabled() && timelineServiceEnabled) { addTimelineDelegationToken(appContext.getAMContainerSpec()); } // TODO: YARN-1763:Handle RM failovers during the submitApplication // call. rmClient.submitApplication(request); int pollCount = 0; long startTime = System.currentTimeMillis(); while (true) { try { YarnApplicationState state = getApplicationReport(applicationId) .getYarnApplicationState(); if (!state.equals(YarnApplicationState.NEW) && !state.equals(YarnApplicationState.NEW_SAVING)) { LOG.info("Submitted application " + applicationId); break; } long elapsedMillis = System.currentTimeMillis() - startTime; if (enforceAsyncAPITimeout() && elapsedMillis >= asyncApiPollTimeoutMillis) { throw new YarnException( "Timed out while waiting for application " + applicationId + " to be submitted successfully"); } // Notify the client through the log every 10 poll, in case the // client // is blocked here too long. if (++pollCount % 10 == 0) { LOG.info("Application submission is not finished, " + "submitted application " + applicationId + " is still in " + state); } try { Thread.sleep(submitPollIntervalMillis); } catch (InterruptedException ie) { LOG.error("Interrupted while waiting for application " + applicationId + " to be successfully submitted."); } } catch (ApplicationNotFoundException ex) { // FailOver or RM restart happens before RMStateStore saves // ApplicationState LOG.info("Re-submit application " + applicationId + "with the " + "same ApplicationSubmissionContext"); rmClient.submitApplication(request); } } return applicationId; }
注意其中这一句:
rmClient.submitApplication(request);
request是包含了我们服务整体参数以及运行脚本的对象,我们提交给RM。
接下来看看RM端的实现:
// call RMAppManager to submit application directly rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user); LOG.info("Application with id " + applicationId.getId() + " submitted by user " + user); RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST, "ClientRMService", applicationId);
重要的代码在这儿,由rmAppManager提交了任务,注意,这里是RM的逻辑,其实就是RM提交的RPC请求,请求对应的NodeManager来启动对应的container,我们点进去看看:
@SuppressWarnings("unchecked") protected void submitApplication( ApplicationSubmissionContext submissionContext, long submitTime, String user) throws YarnException { ApplicationId applicationId = submissionContext.getApplicationId(); RMAppImpl application = createAndPopulateNewRMApp(submissionContext, submitTime, user, false); ApplicationId appId = submissionContext.getApplicationId(); if (UserGroupInformation.isSecurityEnabled()) { try { this.rmContext.getDelegationTokenRenewer().addApplicationAsync( appId, parseCredentials(submissionContext), submissionContext.getCancelTokensWhenComplete(), application.getUser()); } catch (Exception e) { LOG.warn("Unable to parse credentials.", e); // Sending APP_REJECTED is fine, since we assume that the // RMApp is in NEW state and thus we haven't yet informed the // scheduler about the existence of the application assert application.getState() == RMAppState.NEW; this.rmContext .getDispatcher() .getEventHandler() .handle(new RMAppRejectedEvent(applicationId, e .getMessage())); throw RPCUtil.getRemoteException(e); } } else { // Dispatcher is not yet started at this time, so these START events // enqueued should be guaranteed to be first processed when // dispatcher // gets started. this.rmContext .getDispatcher() .getEventHandler() .handle(new RMAppEvent(applicationId, RMAppEventType.START)); } }
我们仔细看下这段代码,先看下createAndPopulateNewRMApp方法,具体代码不说,挑选其中一部分:
RMAppImpl application = new RMAppImpl(applicationId, rmContext, this.conf, submissionContext.getApplicationName(), user, submissionContext.getQueue(), submissionContext, this.scheduler, this.masterService, submitTime, submissionContext.getApplicationType(), submissionContext.getApplicationTags(), amReq);
这里,新建了一个RMAppImpl,我们仔细看下这个类,其中封装了一个状态机,这是Yarn机制的一个重大改进,每个服务都随着状态的不断改变而进行操作,具体可以参考下设计模式中的状态模式,原理相同。
看下什么是状态机:
/** * State machine topology. This object is semantically immutable. If you have a * StateMachineFactory there's no operation in the API that changes its semantic * properties. * * @param <OPERAND> * The object type on which this state machine operates. * @param <STATE> * The state of the entity. * @param <EVENTTYPE> * The external eventType to be handled. * @param <EVENT> * The event object. * */ @Public @Evolving final public class StateMachineFactory<OPERAND, STATE extends Enum<STATE>, EVENTTYPE extends Enum<EVENTTYPE>, EVENT> {
注释非常详细,不说了,我们看下我们用到的addTransition方法:
/** * @return a NEW StateMachineFactory just like {@code this} with the current * transition added as a new legal transition. This overload has no * hook object. * * Note that the returned StateMachineFactory is a distinct object. * * This method is part of the API. * * @param preState * pre-transition state * @param postState * post-transition state * @param eventType * stimulus for the transition */ public StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> addTransition( STATE preState, STATE postState, EVENTTYPE eventType) { return addTransition(preState, postState, eventType, null); }
每次都规定了当前状态,触发的事件,以及时间触发之后的状态,而实际处理逻辑,则是由其他类来实现的。
我们继续看,还是RMAppImpl的初始化部分,其中传入了一个rmContext,我在RM初始化和服务启动的博客里提到,这是RM的大管家,封装了很多相关的服务,而这里,就是把RMAppImpl和RM联系上。
这里再说下,本系列基于2.6.5版本的hadoop,而关于resourceManager系列文章是基于2.2.0版本的hadoop,代码上有些区别,但基本原理是相同的。
这里就牵涉到其中一个区别:
createAndInitActiveServices();
在2.2.0的代码中,RM启动的时候没有这句,代码比较分散,而这里,对于rmContext的初始化,基本都整合在该方法内:
/** * Helper method to create and init {@link #activeServices}. This creates an * instance of {@link RMActiveServices} and initializes it. * * @throws Exception */ protected void createAndInitActiveServices() throws Exception { activeServices = new RMActiveServices(this); activeServices.init(conf); }
可以看下,其中新建了一个RMActiveServices,并进行初始化,顾名思义,对于所有提交的活着的ApplicationMaster,均交给这个类来进行处理,这个类的初始化代码在此不多说,粘贴于下:
activeServiceContext = new RMActiveServiceContext(); rmContext.setActiveServiceContext(activeServiceContext); conf.setBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, true); rmSecretManagerService = createRMSecretManagerService(); addService(rmSecretManagerService); containerAllocationExpirer = new ContainerAllocationExpirer(rmDispatcher); addService(containerAllocationExpirer); rmContext.setContainerAllocationExpirer(containerAllocationExpirer); AMLivelinessMonitor amLivelinessMonitor = createAMLivelinessMonitor(); addService(amLivelinessMonitor); rmContext.setAMLivelinessMonitor(amLivelinessMonitor); AMLivelinessMonitor amFinishingMonitor = createAMLivelinessMonitor(); addService(amFinishingMonitor); rmContext.setAMFinishingMonitor(amFinishingMonitor); RMNodeLabelsManager nlm = createNodeLabelManager(); nlm.setRMContext(rmContext); addService(nlm); rmContext.setNodeLabelManager(nlm); boolean isRecoveryEnabled = conf.getBoolean(YarnConfiguration.RECOVERY_ENABLED, YarnConfiguration.DEFAULT_RM_RECOVERY_ENABLED); RMStateStore rmStore = null; if (isRecoveryEnabled) { recoveryEnabled = true; rmStore = RMStateStoreFactory.getStore(conf); boolean isWorkPreservingRecoveryEnabled = conf.getBoolean( YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED, YarnConfiguration.DEFAULT_RM_WORK_PRESERVING_RECOVERY_ENABLED); rmContext.setWorkPreservingRecoveryEnabled(isWorkPreservingRecoveryEnabled); } else { recoveryEnabled = false; rmStore = new NullRMStateStore(); } try { rmStore.init(conf); rmStore.setRMDispatcher(rmDispatcher); rmStore.setResourceManager(rm); } catch (Exception e) { // the Exception from stateStore.init() needs to be handled for // HA and we need to give up master status if we got fenced LOG.error("Failed to init state store", e); throw e; } rmContext.setStateStore(rmStore); if (UserGroupInformation.isSecurityEnabled()) { delegationTokenRenewer = createDelegationTokenRenewer(); rmContext.setDelegationTokenRenewer(delegationTokenRenewer); } // Register event handler for NodesListManager nodesListManager = new NodesListManager(rmContext); rmDispatcher.register(NodesListManagerEventType.class, nodesListManager); addService(nodesListManager); rmContext.setNodesListManager(nodesListManager); // Initialize the scheduler scheduler = createScheduler(); scheduler.setRMContext(rmContext); addIfService(scheduler); rmContext.setScheduler(scheduler); schedulerDispatcher = createSchedulerEventDispatcher(); addIfService(schedulerDispatcher); rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher); // Register event handler for RmAppEvents rmDispatcher.register(RMAppEventType.class, new ApplicationEventDispatcher(rmContext)); // Register event handler for RmAppAttemptEvents rmDispatcher.register(RMAppAttemptEventType.class, new ApplicationAttemptEventDispatcher(rmContext)); // Register event handler for RmNodes rmDispatcher.register(RMNodeEventType.class, new NodeEventDispatcher(rmContext)); nmLivelinessMonitor = createNMLivelinessMonitor(); addService(nmLivelinessMonitor); resourceTracker = createResourceTrackerService(); addService(resourceTracker); rmContext.setResourceTrackerService(resourceTracker); DefaultMetricsSystem.initialize("ResourceManager"); JvmMetrics.initSingleton("ResourceManager", null); // Initialize the Reservation system if (conf.getBoolean(YarnConfiguration.RM_RESERVATION_SYSTEM_ENABLE, YarnConfiguration.DEFAULT_RM_RESERVATION_SYSTEM_ENABLE)) { reservationSystem = createReservationSystem(); if (reservationSystem != null) { reservationSystem.setRMContext(rmContext); addIfService(reservationSystem); rmContext.setReservationSystem(reservationSystem); LOG.info("Initialized Reservation system"); } } // creating monitors that handle preemption createPolicyMonitors(); masterService = createApplicationMasterService(); addService(masterService); rmContext.setApplicationMasterService(masterService); applicationACLsManager = new ApplicationACLsManager(conf); queueACLsManager = createQueueACLsManager(scheduler, conf); rmAppManager = createRMAppManager(); // Register event handler for RMAppManagerEvents rmDispatcher.register(RMAppManagerEventType.class, rmAppManager); clientRM = createClientRMService(); addService(clientRM); rmContext.setClientRMService(clientRM); applicationMasterLauncher = createAMLauncher(); rmDispatcher.register(AMLauncherEventType.class, applicationMasterLauncher); addService(applicationMasterLauncher); if (UserGroupInformation.isSecurityEnabled()) { addService(delegationTokenRenewer); delegationTokenRenewer.setRMContext(rmContext); } new RMNMInfo(rmContext, scheduler); super.serviceInit(conf);
有点多,但是可以看出来,基本逻辑没有变化,只是把代码放置的更加集中了,属于重构方面的优化,功能没多大改变。
接着看上文异步提交的代码,追溯到 DelegationTokenRenewer中DelegationTokenRenewerRunnable的run方法,异步执行:
if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt.getType().equals(DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); }
很明显,我们提交的是一种类型的事件,继续看handleDTRnnewerAppSubmitEvent方法:
// Setup tokens for renewal DelegationTokenRenewer.this.handleAppSubmitEvent(event); rmContext.getDispatcher().getEventHandler() .handle(new RMAppEvent(event.getApplicationId(), RMAppEventType.START));
主要逻辑如上,第一个方法不多说,主要是提交相应的token,我们看第二个,清晰明了,是进行事件的调度,这里封装了一个事件,叫做RMAppEvent,该类没有注释,其实主要就是ApplicationMaster相关的状态,我们可以直接看RMAppEventType:
public enum RMAppEventType { // Source: ClientRMService START, RECOVER, KILL, MOVE, // Move app to a new queue // Source: Scheduler and RMAppManager APP_REJECTED, // Source: Scheduler APP_ACCEPTED, // Source: RMAppAttempt ATTEMPT_REGISTERED, ATTEMPT_UNREGISTERED, ATTEMPT_FINISHED, // Will send the final state ATTEMPT_FAILED, ATTEMPT_KILLED, NODE_UPDATE, // Source: Container and ResourceTracker APP_RUNNING_ON_NODE, // Source: RMStateStore APP_NEW_SAVED, APP_UPDATE_SAVED, }
枚举类,主要定义了各种状态,注释解释了都应该由谁来进行事件的处理,所以,这里是把一个RMAppEventType.START的事件交给了全局调度器RMDispatcher,放入自己的队列中,等待其他类来进行处理。
调度给谁呢,这就牵涉到RM中的createAndInitActiveService代码,发现该事件是由ApplicationEventDispatcher来handle的,继续看代码:
rmDispatcher.register(RMAppEventType.class, new ApplicationEventDispatcher(rmContext));
@Override public void handle(RMAppEvent event) { ApplicationId appID = event.getApplicationId(); RMApp rmApp = this.rmContext.getRMApps().get(appID); if (rmApp != null) { try { rmApp.handle(event); } catch (Throwable t) { LOG.error("Error in handling event type " + event.getType() + " for application " + appID, t); } } }
最后,交给了RMApp处理,其实现类为RMAppImpl,看其中的handle方法:
@Override public void handle(RMAppEvent event) { this.writeLock.lock(); try { ApplicationId appID = event.getApplicationId(); LOG.debug("Processing event for " + appID + " of type " + event.getType()); final RMAppState oldState = getState(); try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitonException e) { LOG.error("Can't handle this event at current state", e); /* TODO fail the application on the failed transition */ } if (oldState != getState()) { LOG.info(appID + " State change from " + oldState + " to " + getState()); } } finally { this.writeLock.unlock(); } }
可以看到,里面只进行了状态机的转换,这就要从状态机的初始化开始说起了,此处不予赘述,直接上一图:
这里上张状态机转换的示意图,大家知道这个意思就行了,状态机的内部机制大致如此,RMAppImpl内部的状态机初始状态时RMAppState.NEW,这里调用的transition转化后状态是:RMAppState.START,转化函数定义在RMAppNewlySavingTransition类内部,这是个单边转换,如上:
addTransition(RMAppState.NEW, RMAppState.NEW_SAVING, RMAppEventType.START, new RMAppNewlySavingTransition())
我们看看这个类的内部实现:
@Override public void transition(RMAppImpl app, RMAppEvent event) { // If recovery is enabled then store the application information in a // non-blocking call so make sure that RM has stored the information // needed to restart the AM after RM restart without further client // communication LOG.info("Storing application with id " + app.applicationId); app.rmContext.getStateStore().storeNewApplication(app); }
这里的逻辑明确一下:RMAppImpl内部的状态机调用了doTransition方法,而实际上这个状态机的实现类是:
this.stateMachine = stateMachineFactory.make(this);
public StateMachine<STATE, EVENTTYPE, EVENT> make(OPERAND operand) { return new InternalStateMachine(operand, defaultInitialState); }
InternalStateMachine调用doTransition方法:
@Override public synchronized STATE doTransition(EVENTTYPE eventType, EVENT event) throws InvalidStateTransitonException { currentState = StateMachineFactory.this.doTransition(operand, currentState, eventType, event); return currentState; }
private STATE doTransition(OPERAND operand, STATE oldState, EVENTTYPE eventType, EVENT event) throws InvalidStateTransitonException { // We can assume that stateMachineTable is non-null because we call // maybeMakeStateMachineTable() when we build an InnerStateMachine , // and this code only gets called from inside a working InnerStateMachine . Map<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>> transitionMap = stateMachineTable.get(oldState); if (transitionMap != null) { Transition<OPERAND, STATE, EVENTTYPE, EVENT> transition = transitionMap.get(eventType); if (transition != null) { return transition.doTransition(operand, oldState, event, eventType); } } throw new InvalidStateTransitonException(oldState, eventType); }
private class SingleInternalArc implements Transition<OPERAND, STATE, EVENTTYPE, EVENT> { private STATE postState; private SingleArcTransition<OPERAND, EVENT> hook; // transition hook SingleInternalArc(STATE postState, SingleArcTransition<OPERAND, EVENT> hook) { this.postState = postState; this.hook = hook; } @Override public STATE doTransition(OPERAND operand, STATE oldState, EVENT event, EVENTTYPE eventType) { if (hook != null) { hook.transition(operand, event); } return postState; } }
最后,我们追溯到这里,发现需要执行RMAppNewlyTransition的doTransition方法,这个方法传入了两个参数,调用的时候具体是哪两个参数呢:
/* * @return a {@link StateMachine} that starts in the default initial state and * whose {@link Transition} s are applied to {@code operand} . * * This is part of the API. * * @param operand the object upon which the returned {@link StateMachine} will * operate. * */ public StateMachine<STATE, EVENTTYPE, EVENT> make(OPERAND operand) { return new InternalStateMachine(operand, defaultInitialState); }
我们注意看下这个make方法,我们在RMAppImpl构造的时候调用了这个,传入的operand实际上就是RMAppImpl自身,所以对应的操作,其实就是对RMAppImpl的操作,我们把提交的ApplicationMaster存在了rmContext内:
/** * Non-Blocking API ResourceManager services use this to store the application's * state This does not block the dispatcher threads RMAppStoredEvent will be * sent on completion to notify the RMApp */ @SuppressWarnings("unchecked") public synchronized void storeNewApplication(RMApp app) { ApplicationSubmissionContext context = app.getApplicationSubmissionContext(); assert context instanceof ApplicationSubmissionContextPBImpl; ApplicationState appState = new ApplicationState(app.getSubmitTime(), app.getStartTime(), context, app.getUser()); dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState)); }
这个存储方法如下,注意,接下来的dispatcher是RMStateStore内部的调度器,而非全局调度器,然后把该事件放在了自己的内部事件队列中:
dispatcher.register(RMStateStoreEventType.class, new ForwardingEventHandler());
注意这里,经过初始化和服务启动,RMStateStore的调度器把此类事件调度给ForwardingEventHandler来处理:
private final class ForwardingEventHandler implements EventHandler<RMStateStoreEvent> { @Override public void handle(RMStateStoreEvent event) { handleStoreEvent(event); } }
// Dispatcher related code protected void handleStoreEvent(RMStateStoreEvent event) { try { this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitonException e) { LOG.error("Can't handle this event at current state", e); } }
这里,传入的事件类型实际上是:
public RMStateStoreAppEvent(ApplicationState appState) { super(RMStateStoreEventType.STORE_APP); this.appState = appState; }我们看看相应类的doTransition方法:
addTransition(RMStateStoreState.DEFAULT, RMStateStoreState.DEFAULT, RMStateStoreEventType.STORE_APP, new StoreAppTransition())
private static class StoreAppTransition implements SingleArcTransition<RMStateStore, RMStateStoreEvent> { @Override public void transition(RMStateStore store, RMStateStoreEvent event) { if (!(event instanceof RMStateStoreAppEvent)) { // should never happen LOG.error("Illegal event type: " + event.getClass()); return; } ApplicationState appState = ((RMStateStoreAppEvent) event).getAppState(); ApplicationId appId = appState.getAppId(); ApplicationStateData appStateData = ApplicationStateData.newInstance(appState); LOG.info("Storing info for app: " + appId); try { store.storeApplicationStateInternal(appId, appStateData); store.notifyApplication(new RMAppEvent(appId, RMAppEventType.APP_NEW_SAVED)); } catch (Exception e) { LOG.error("Error storing app: " + appId, e); store.notifyStoreOperationFailed(e); } }; }
看起来依旧是存储的逻辑,这就不分析了,继续往下走,看看notifyApplication方法:
@SuppressWarnings("unchecked") /** * This method is called to notify the application that new application is * stored or updated in state store * * @param event * App event containing the app id and event type */ private void notifyApplication(RMAppEvent event) { rmDispatcher.getEventHandler().handle(event); }
发现,内部处理完毕之后,再次把事件传给了全局调度器,即RMDispatcher,我们看看其是如何处理这类事件的:
addTransition(RMAppState.NEW_SAVING, RMAppState.SUBMITTED, RMAppEventType.APP_NEW_SAVED, new AddApplicationToSchedulerTransition())
这里的关系有点绕,但实际上,此类事件就是RMAppImpl来处理的,对应的代码在RM内:
// Register event handler for RmAppEvents rmDispatcher.register(RMAppEventType.class, new ApplicationEventDispatcher(rmContext));
@Override public void handle(RMAppEvent event) { ApplicationId appID = event.getApplicationId(); RMApp rmApp = this.rmContext.getRMApps().get(appID); if (rmApp != null) { try { rmApp.handle(event); } catch (Throwable t) { LOG.error("Error in handling event type " + event.getType() + " for application " + appID, t); } } }
这里,大家注意一点,为什么最开始要先申请到ApplicationId,这个太重要了,在整个程序运行过程中,对应于ApplicationId,有一个一直存在的RMAppImpl,而后面的所有操作,基本都是围绕着这个RMAppImpl的,而如何找到这个,则是通过id的key:
private static final class AddApplicationToSchedulerTransition extends RMAppTransition { @Override public void transition(RMAppImpl app, RMAppEvent event) { app.handler.handle(new AppAddedSchedulerEvent(app.applicationId, app.submissionContext.getQueue(), app.user, app.submissionContext.getReservationID())); } }
接下来的处理代码,用到了RMAppImpl内部的handler的handle方法,再次提交了一个事件,事件的目的,是告知scheduler开始调度,进行ApplicationMaster的初始化:
而如何进行初始化,以及后续的ApplicationMaster启动,请听下回分解: