文章目录
Spark源码剖析——SparkSubmit提交流程
当前环境与版本
环境 | 版本 |
---|---|
JDK | java version “1.8.0_231” (HotSpot) |
Scala | Scala-2.11.12 |
Spark | spark-2.4.4 |
前言
- 运行Spark应用时,通常我们会利用
./bin/spark-submit
进行提交任务,例如spark-submit \ --master yarn --deploy-mode cluster \ --num-executors 10 --executor-memory 8G --executor-cores 4 \ --driver-memory 4G \ --conf spark.network.timeout=300 \ --class com.skey.spark.app.MyApp /home/jerry/spark-demo.jar
- 在运行spark-submit后,程序将为我们解析参数,根据不同的部署模式,采用不同的方法提交Spark应用到集群中,例如
- Standalone
- client -> 在本地运行用户编写的类的Main方法,即在本地启动Driver
- cluster -> 利用ClientApp向集群申请节点,用于启动Driver
- ON YARN
- client -> 在本地运行,同Standalone
- cluster -> 利用YarnClusterApplication向集群申请节点,用于启动Driver
- Standalone
- SparkSubmit的整体的提交流程图如下
- 下面我们就来看看SparkSubmit任务提交流程的源码
Shell命令部分
- 首先,我们会调用
./bin/spark-submit
的shell命令,传参并进行提交,而其中首先调用了./bin/spark-class
,主要的代码部分如下exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
- 要注意的是,此处向spark-class传入了参数
org.apache.spark.deploy.SparkSubmit
- 而
./bin/spark-class
首先会利用/bin/load-spark-env.sh
加载环境变量,代码如下. "${SPARK_HOME}"/bin/load-spark-env.sh
- 其中重要的是
/bin/load-spark-env.sh
调用了./conf/spark-env.sh
,也就是我们平常配置的各种默认环境参数,例如SPARK_MASTER_HOST、SPARK_WORKER_MEMORY、HADOOP_CONF_DIR等等。因此,我们可以知道每一个应用提交时都会重新读取该配置文件。 - 接着,
./bin/spark-class
会去寻找Java命令、Jar文件、启动Java进程等,其中最主要的代码如下build_command() { # 此RUNNER是前面解析的java命令 # LAUNCH_CLASSPATH一般是SPARK_HOME/jars/* # org.apache.spark.launcher.Main将会按Null字符('\0')分隔打印解析的参数 # "$@" 此处既是前面spark-submit传入的参数,需要注意的是第一个参数是 org.apache.spark.deploy.SparkSubmit "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" printf "%d\0" $? } set +o posix CMD=() # 执行(build_command "$@"),将结果重定向到while循环中 # while中由read命令处理接收到的字符串(按Null字符分隔) # 将结果添加到CMD列表中 while IFS= read -d '' -r ARG; do CMD+=("$ARG") done < <(build_command "$@") // 省略部分代码…… # 执行命令 CMD=("${CMD[@]:0:$LAST}") exec "${CMD[@]}"
参数解析 Main
- org.apache.spark.launcher.Main
- 此类主要用于参数解析,以及按不同的模式输出运行命令,代码不多,如下
class Main { public static void main(String[] argsArray) throws Exception { checkArgument(argsArray.length > 0, "Not enough arguments: missing class name."); List<String> args = new ArrayList<>(Arrays.asList(argsArray)); // 获取接收的第一个参数,即前面传入的org.apache.spark.deploy.SparkSubmit String className = args.remove(0); boolean printLaunchCommand = !isEmpty(System.getenv("SPARK_PRINT_LAUNCH_COMMAND")); Map<String, String> env = new HashMap<>(); List<String> cmd; if (className.equals("org.apache.spark.deploy.SparkSubmit")) { try { // 解析参数,例如--class、--conf等 // 并构建命令 AbstractCommandBuilder builder = new SparkSubmitCommandBuilder(args); // cmd中主要添加了java -cp classpath org.apache.spark.deploy.SparkSubmit cmd = buildCommand(builder, env, printLaunchCommand); } catch (IllegalArgumentException e) { // 省略部分代码 } } else { // 如果自定义了SparkSubmit,则走此部分 AbstractCommandBuilder builder = new SparkClassCommandBuilder(className, args); cmd = buildCommand(builder, env, printLaunchCommand); } // 在不同操作系统环境下,按不同方式打印输出 if (isWindows()) { System.out.println(prepareWindowsCommand(cmd, env)); } else { List<String> bashCmd = prepareBashCommand(cmd, env); for (String c : bashCmd) { System.out.print(c); System.out.print('\0'); // 使用Null字符进行分隔 } } } // 省略部分代码 }
- 最后,打印的参数(主要包括java -cp classpath org.apache.spark.deploy.SparkSubmit等)将被
./bin/spark-class
中的CMD列表接收,并使用exec执行。
SparkSubmit
-
org.apache.spark.deploy.SparkSubmit
-
此类就是真正进行Spark应用提交的类了,正如前面部分所说,此类对接收的参数进行解析,并根据不同的模式进行应用提交。
-
SparkSubmit有一个class以及一个伴生对象,首先我们看到其伴生对象的main方法中,此处即是java进程的入口
override def main(args: Array[String]): Unit = { // 实例化SparkSubmit,并重写部分方法 val submit = new SparkSubmit() { self => // 为this定义一个别名,方便传入SparkSubmitArguments override protected def parseArguments(args: Array[String]): SparkSubmitArguments = { // 重写SparkSubmitArguments的日志打印方法 // 使其打印时调用SparkSubmit的logInfo、logWarning new SparkSubmitArguments(args) { override protected def logInfo(msg: => String): Unit = self.logInfo(msg) override protected def logWarning(msg: => String): Unit = self.logWarning(msg) } } override protected def logInfo(msg: => String): Unit = printMessage(msg) override protected def logWarning(msg: => String): Unit = printMessage(s"Warning: $msg") override def doSubmit(args: Array[String]): Unit = { try { // 此处还是调用的父类的doSubmit,只不过后面添加了对异常的处理 super.doSubmit(args) } catch { case e: SparkUserAppException => exitFn(e.exitCode) } } } // 调用 SparkSubmit的doSubmit进行任务提交 submit.doSubmit(args) }
-
此处代码最终会调用SparkSubmit的doSubmit,其代码如下
def doSubmit(args: Array[String]): Unit = { // Initialize logging if it hasn't been done yet. Keep track of whether logging needs to // be reset before the application starts. val uninitLog = initializeLogIfNecessary(true, silent = true) // parseArguments会实例化SparkSubmitArguments // 需要注意的是前面的伴生对象中已经重写过SparkSubmitArguments中的日志方法 val appArgs = parseArguments(args) if (appArgs.verbose) { logInfo(appArgs.toString) } // 提交任务时,现在action是走的SparkSubmitAction.SUBMIT // 有兴趣的朋友可以看看,SUBMIT由SparkSubmitArguments中的loadEnvironmentArguments方法解析得到 appArgs.action match { case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog) case SparkSubmitAction.KILL => kill(appArgs) case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs) case SparkSubmitAction.PRINT_VERSION => printVersion() } }
-
由于我们现在是提交任务,此部分代码将会接着调用submit,代码如下
private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = { // 定义了一个doRunMain,等下会被调用 // 最终会调用runMain(...)方法 def doRunMain(): Unit = { // proxyUser是指的代理用户,由--proxy-user指定 // 主要用于冒充其他用户的名称,例如本用户是jerry,但是你可以冒充为tom,越过用户权限,处理tom的文件 if (args.proxyUser != null) { val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser, UserGroupInformation.getCurrentUser()) try { proxyUser.doAs(new PrivilegedExceptionAction[Unit]() { override def run(): Unit = { runMain(args, uninitLog) } }) } catch { // 省略部分代码 } } else { runMain(args, uninitLog) } } // 判断启动模式,不过最终都会调用doRunMain()方法 if (args.isStandaloneCluster && args.useRest) { try { logInfo("Running Spark using the REST application submission protocol.") doRunMain() } catch { // 省略部分代码 } } else { doRunMain() } }
-
可以看到submit主要是针对是否使用代理用户进行了处理,最后调用了runMain(…)方法,此方法就是SparkSubmit的核心了,其代码如下
private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = { // prepareSubmitEnvironment用于解析参数,主要决定了启动的模式 val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args) // 省略部分代码 // 决定使用的ClassLoader,由参数spark.driver.userClassPathFirst决定,默认为false val loader = if (sparkConf.get(DRIVER_USER_CLASS_PATH_FIRST)) { // 该ClassLoader会优先使用用户提供的jar包 new ChildFirstURLClassLoader(new Array[URL](0), Thread.currentThread.getContextClassLoader) } else { // 默认的ClassLoader new MutableURLClassLoader(new Array[URL](0), Thread.currentThread.getContextClassLoader) } Thread.currentThread.setContextClassLoader(loader) for (jar <- childClasspath) { addJarToClasspath(jar, loader) } var mainClass: Class[_] = null try { // 根据参数childMainClass获取类对象 mainClass = Utils.classForName(childMainClass) } catch { // 省略部分代码 } val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) { // 如果mainClass是SparkApplication,那就实例化它 mainClass.newInstance().asInstanceOf[SparkApplication] } else { // 否则使用JavaMainApplication包装它 if (classOf[scala.App].isAssignableFrom(mainClass)) { logWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.") } new JavaMainApplication(mainClass) } // 省略部分代码 try { // 调用start方法 // start中对class进行了反射,并调用了其main方法 app.start(childArgs.toArray, sparkConf) } catch { case t: Throwable => throw findCause(t) } }
-
显然,runMain(…)中最为重要的变量是childMainClass,因为它决定了接下来要运行的类。为了看看它到底是什么类,我们追踪到prepareSubmitEnvironment(…)方法中来看,其中也有一个变量childMainClass,它就是等要返回的变量。下面我们将围绕childMainClass进行分析。
- 判断Client模式
if (deployMode == CLIENT) { // 如果CLIENT模式,直接将args.mainClass赋值给childMainClass // args.mainClass也就是我们提交时--class指定的类 childMainClass = args.mainClass if (localPrimaryResource != null && isUserJar(localPrimaryResource)) { childClasspath += localPrimaryResource } if (localJars != null) { childClasspath ++= localJars.split(",") } }
- 判断StandaloneCluster模式
// 先看是否是StandaloneCluster模式 if (args.isStandaloneCluster) { if (args.useRest) { // 如果是REST,那就使用 org.apache.spark.deploy.rest.RestSubmissionClientApp childMainClass = REST_CLUSTER_SUBMIT_CLASS // 传入 args.mainClass childArgs += (args.primaryResource, args.mainClass) } else { // 否则,使用 org.apache.spark.deploy.ClientApp childMainClass = STANDALONE_CLUSTER_SUBMIT_CLASS if (args.supervise) { childArgs += "--supervise" } Option(args.driverMemory).foreach { m => childArgs += ("--memory", m) } Option(args.driverCores).foreach { c => childArgs += ("--cores", c) } childArgs += "launch" // 传入 args.mainClass childArgs += (args.master, args.primaryResource, args.mainClass) } if (args.childArgs != null) { childArgs ++= args.childArgs } }
- 判断YarnCluster模式
if (isYarnCluster) { // 如果是YarnCluster模式,使用org.apache.spark.deploy.yarn.YarnClusterApplication childMainClass = YARN_CLUSTER_SUBMIT_CLASS if (args.isPython) { childArgs += ("--primary-py-file", args.primaryResource) childArgs += ("--class", "org.apache.spark.deploy.PythonRunner") } else if (args.isR) { val mainFile = new Path(args.primaryResource).getName childArgs += ("--primary-r-file", mainFile) childArgs += ("--class", "org.apache.spark.deploy.RRunner") } else { if (args.primaryResource != SparkLauncher.NO_RESOURCE) { childArgs += ("--jar", args.primaryResource) } // 传入 args.mainClass childArgs += ("--class", args.mainClass) } if (args.childArgs != null) { args.childArgs.foreach { arg => childArgs += ("--arg", arg) } } }
- 其他模式自行查看即可(MesosCluster、KubernetesCluster)
- 判断Client模式
-
我们可以看到,如果是Client模式,那么就会调用用户编写的class的main方法。如果是cluster模式,根据不同的部署情况会分别调用RestSubmissionClientApp、ClientApp、YarnClusterApplication、KubernetesClientApplication等,进行下一步处理。
-
而在Cluster模式下,这几个SparkApplication在本地启动,分别都会去申请节点,并在申请的节点处启动Driver(调用用户编写的class的main方法),下面我们来看看这几个SparkApplication的源码。
Standalone模式的ClientApp
- org.apache.spark.deploy.ClientApp
- 其代码如下
private[spark] class ClientApp extends SparkApplication { override def start(args: Array[String], conf: SparkConf): Unit = { // ClientArguments内部会调用parse(args.toList)进行解析 val driverArgs = new ClientArguments(args) if (!conf.contains("spark.rpc.askTimeout")) { conf.set("spark.rpc.askTimeout", "10s") } Logger.getRootLogger.setLevel(driverArgs.logLevel) // 创建NettyRpcEnv val rpcEnv = RpcEnv.create("driverClient", Utils.localHostName(), 0, conf, new SecurityManager(conf)) // 利用Master的URL地址,获取到其RpcEndpointRef val masterEndpoints = driverArgs.masters.map(RpcAddress.fromSparkURL). map(rpcEnv.setupEndpointRef(_, Master.ENDPOINT_NAME)) // 实例化ClientEndpoint,并注册 rpcEnv.setupEndpoint("client", new ClientEndpoint(rpcEnv, driverArgs, masterEndpoints, conf)) rpcEnv.awaitTermination() } }
- 此部分代码又涉及到了我们在前面Spark源码剖析——RpcEndpoint、RpcEnv所讲的通信过程,不太了解的朋友请先看看。
- ClientEndpoint被实例化后,其onStart方法会被调用,代码如下
override def onStart(): Unit = { driverArgs.cmd match { case "launch" => // 记住该类DriverWrapper,后面会启动它,并调用用户编写的class的main方法 val mainClass = "org.apache.spark.deploy.worker.DriverWrapper" // 省略部分代码 // 构建Command // 此处传入的driverArgs.mainClass就是用户编写的class val command = new Command(mainClass, Seq("{{WORKER_URL}}", "{{USER_JAR}}", driverArgs.mainClass) ++ driverArgs.driverOptions, sys.env, classPathEntries, libraryPathEntries, javaOpts) val driverDescription = new DriverDescription( driverArgs.jarUrl, driverArgs.memory, driverArgs.cores, driverArgs.supervise, command) // 向Master发送RequestSubmitDriver消息 asyncSendToMasterAndForwardReply[SubmitDriverResponse]( RequestSubmitDriver(driverDescription)) case "kill" => val driverId = driverArgs.driverId asyncSendToMasterAndForwardReply[KillDriverResponse](RequestKillDriver(driverId)) } }
- 接着,我们来看Master的receiveAndReply中接收到RequestSubmitDriver消息会做什么
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case RequestSubmitDriver(description) => // 高可用模式下,Master会有多个,因此有不同状态 if (state != RecoveryState.ALIVE) { // 如果不是ALIVE,那么回复携带失败消息的SubmitDriverResponse val msg = s"${Utils.BACKUP_STANDALONE_MASTER_PREFIX}: $state. " + "Can only accept driver submissions in ALIVE state." context.reply(SubmitDriverResponse(self, false, None, msg)) } else { logInfo("Driver submitted " + description.command.mainClass) // 根据传过来的description创建Driver val driver = createDriver(description) // 此处主要构建了DriverInfo persistenceEngine.addDriver(driver) waitingDrivers += driver drivers.add(driver) // 实际启动Driver的代码 schedule() // 回复包含driver.id的消息SubmitDriverResponse context.reply(SubmitDriverResponse(self, true, Some(driver.id), s"Driver successfully submitted as ${driver.id}")) } // 省略部分代码 }
- receiveAndReply中接收到RequestSubmitDriver消息后,会构建一个DriverInfo,并调用schedule()方法创建Driver。接着来看实际调用启动Driver的代码schedule()
private def schedule(): Unit = { if (state != RecoveryState.ALIVE) { return } // 取出可用的workers,并随机打乱顺序 // 打乱顺序主要是防止部分worker上启动太多的driver,这样做可以使drveir均匀的分布在集群中 val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) val numWorkersAlive = shuffledAliveWorkers.size var curPos = 0 // 遍历轮询waitingDrivers,即前面构建的DriverInfo for (driver <- waitingDrivers.toList) { var launched = false var numWorkersVisited = 0 // 必须要少于可用的数量,且并且没启动 while (numWorkersVisited < numWorkersAlive && !launched) { // 从前面的随机Seq中,去一个worker节点 val worker = shuffledAliveWorkers(curPos) numWorkersVisited += 1 // worker空闲的内存要大于申请的内存 // worker空闲的core数要大于申请的core数 if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { // 如果都满足,那么在该worker节点启动driver launchDriver(worker, driver) // 删除已经启动的driver记录,防止重复启动 waitingDrivers -= driver launched = true } curPos = (curPos + 1) % numWorkersAlive } } // 此处是启动Executor的代码,暂时不管 startExecutorsOnWorkers() }
- 此部分代码中,先获取了可用的worker,并随机打乱其顺序。接着遍历之前构建的DriverInfo,取出worker,比较是否有符合条件(memory、cores)的worker。如果有,那么调用launchDriver(…)方法,在该worker启动Driver。
private def launchDriver(worker: WorkerInfo, driver: DriverInfo) { logInfo("Launching driver " + driver.id + " on worker " + worker.id) worker.addDriver(driver) driver.worker = Some(worker) // 向worker节点发送LaunchDriver信息 worker.endpoint.send(LaunchDriver(driver.id, driver.desc)) driver.state = DriverState.RUNNING }
- Master调用launchDriver,会向worker发送一条LaunchDriver信息,我们来看Worker接收到消息后做了什么。
override def receive: PartialFunction[Any, Unit] = synchronized { // 省略部分代码 case LaunchDriver(driverId, driverDesc) => logInfo(s"Asked to launch driver $driverId") // 构建DriverRunner,并调用start启动 val driver = new DriverRunner( conf, driverId, workDir, sparkHome, driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)), self, workerUri, securityMgr) drivers(driverId) = driver driver.start() // 更新本worker节点已用的资源 coresUsed += driverDesc.cores memoryUsed += driverDesc.mem // 省略部分代码 }
- 在Worker节点,接收到LaunchDriver消息后,会构建一个DriverRunner,并调用其start方法启动。start方法中新建了一个Thread,并启动,其中最主要的是调用prepareAndRunDriver(),利用ProcessBuilder启动一个新进程,并阻塞在此处。
- 启动该进程的命令则是实例化DriverRunner传入的driverDesc中的command。该command由最开始在AppClient中的ClientEndpoint发送至Master,再由Master发送至Worker得到。也就是前面让大家记住的org.apache.spark.deploy.worker.DriverWrapper(你可以回去看看ClientEndpoint中的onStart方法)。并且该command还携带了用户的class。
- 最后,我们来到DriverWrapper的main方法中
def main(args: Array[String]) { args.toList match { case workerUrl :: userJar :: mainClass :: extraArgs => val conf = new SparkConf() val host: String = Utils.localHostName() val port: Int = sys.props.getOrElse("spark.driver.port", "0").toInt // 创建NettyRpcENv val rpcEnv = RpcEnv.create("Driver", host, port, conf, new SecurityManager(conf)) logInfo(s"Driver address: ${rpcEnv.address}") // 实例化WorkerWatcher,并注册 rpcEnv.setupEndpoint("workerWatcher", new WorkerWatcher(rpcEnv, workerUrl)) val currentLoader = Thread.currentThread.getContextClassLoader val userJarUrl = new File(userJar).toURI().toURL() // 此处同SparkSubmit的runMain() // 根据spark.driver.userClassPathFirst选择ClassLoader val loader = if (sys.props.getOrElse("spark.driver.userClassPathFirst", "false").toBoolean) { // 该ClassLoader优先使用用户提供的jar包 new ChildFirstURLClassLoader(Array(userJarUrl), currentLoader) } else { new MutableURLClassLoader(Array(userJarUrl), currentLoader) } Thread.currentThread.setContextClassLoader(loader) setupDependencies(loader, userJar) // 反射调用用户编写的class的main方法 val clazz = Utils.classForName(mainClass) val mainMethod = clazz.getMethod("main", classOf[Array[String]]) mainMethod.invoke(null, extraArgs.toArray[String]) rpcEnv.shutdown() case _ => // scalastyle:off println System.err.println("Usage: DriverWrapper <workerUrl> <userJar> <driverMainClass> [options]") // scalastyle:on println System.exit(-1) } }
- 至此,DriverWrapper在Worker中完成了启动,并且在最后利用反射的方式调用了用户编写的class中的main方法,后面的就和Client模式后面一样了。
ON YARN模式的YarnClusterApplication
- org.apache.spark.deploy.yarn.YarnClusterApplication
- 该类的start方法非常简单,主要就是在本地实例化了一个Client,并调用其run方法。
- 我们来看org.apache.spark.deploy.yarn.Client的代码
def run(): Unit = { // 向ResourceManager提交应用 this.appId = submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { // 如果失败,就抛出异常SparkException val report = getApplicationReport(appId) val state = report.getYarnApplicationState logInfo(s"Application report for $appId (state: $state)") logInfo(formatReportDetails(report)) if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { throw new SparkException(s"Application $appId finished with status: $state") } } else { // 如果没问题,就监控该应用的状态直到应用关闭 val YarnAppReport(appState, finalState, diags) = monitorApplication(appId) if (appState == YarnApplicationState.FAILED || finalState == FinalApplicationStatus.FAILED) { diags.foreach { err => logError(s"Application diagnostics message: $err") } throw new SparkException(s"Application $appId finished with failed status") } if (appState == YarnApplicationState.KILLED || finalState == FinalApplicationStatus.KILLED) { throw new SparkException(s"Application $appId is killed") } if (finalState == FinalApplicationStatus.UNDEFINED) { throw new SparkException(s"The final status of application $appId is undefined") } } }
- 此部分最主要的是调用了submitApplication方法,向YARN的ResourceManager申请资源运行ApplicationMaster。其代码如下
def submitApplication(): ApplicationId = { var appId: ApplicationId = null try { // 该launcherBackend在Client实例化时被实例化 launcherBackend.connect() // yarnClient在Client实例化时调用YarnClient.createYarnClient被创建 yarnClient.init(hadoopConf) yarnClient.start() logInfo("Requesting a new application from cluster with %d NodeManagers" .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers)) // 向ResourceManager申请创建应用 val newApp = yarnClient.createApplication() val newAppResponse = newApp.getNewApplicationResponse() appId = newAppResponse.getApplicationId() new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT), Option(appId.toString)).setCurrentContext() // 验证YARN集群中是否有足够的资源运行ApplicationMaster verifyClusterResources(newAppResponse) // 设置用于启动ApplicationMaster的Context // 主要包含了: // 1. 对部分参数的解析,例如--class、--jar等 // 2. 构建包含java的commands,调用amContainer.setCommands传入ApplicationMaster中,用于启动用户编写的class文件 // 3. ApplicationMaster的申请配置 val containerContext = createContainerLaunchContext(newAppResponse) val appContext = createApplicationSubmissionContext(newApp, containerContext) logInfo(s"Submitting application $appId to ResourceManager") // 启动ApplicationMaster yarnClient.submitApplication(appContext) launcherBackend.setAppId(appId.toString) reportLauncherState(SparkAppHandle.State.SUBMITTED) appId } catch { // 省略部分代码 } }
- submitApplication()方法中连接了YARN,申请并创建了ApplicationMaster。其中传入了用于调用java启动用户编写的类的命令,后续在ApplicationMaster中将会启动该命令,后面的就和Client模式后面一样了。