cluster-overview.html

Cluster Mode Overview

This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read through the application submission guide to learn about launching applications on a cluster.

集群模式概述

这篇文档介绍了Spark在集群上运行的大概情况，让我们更容易理解其各个组件是如何交互的。我们可以通读 application submission guide来学习如何在集群上部署应用程序。

Components

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program).

Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager, Mesos or YARN), which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.

组件

Spark应用程序在一个集群上运行着一组独立的进程，包括一个driver和多个executors，在你应用程序的main函数里通过SparkContext对象来协调组织，我们也把Spark applications称之为driver program。

具体来说，集群模式下，SparkContext能够连接不同类型的cluster managers，比如说Spark自己的standalone cluster manager， Mesos或者YARN，而这些cluster managers所扮演的角色是在各个应用程序application之间分配资源。一旦Spark连接上这些cluster managers，Spark就获得了分布在集群各个节点上的executors，这些executors其实是一系列的进程，这些进程执行我们的应用程序application中的计算并存储相关的数据。接着，SparkContext将我们的应用程序application代码发送给executors，这些应用程序application代码是由JAR或者Python文件所定义并且传给SparkContext。最后，SparkContext把tasks发送给executors去执行。

Spark cluster components

There are several useful things to note about this architecture:

Each application gets its own executor processes, which stay up for the duration of the whole application and run tasks in multiple threads. This has the benefit of isolating applications from each other, on both the scheduling side (each driver schedules its own tasks) and executor side (tasks from different applications run in different JVMs). However, it also means that data cannot be shared across different Spark applications (instances of SparkContext) without writing it to an external storage system.
Spark is agnostic to the underlying cluster manager. As long as it can acquire executor processes, and these communicate with each other, it is relatively easy to run it even on a cluster manager that also supports other applications (e.g. Mesos/YARN).
The driver program must listen for and accept incoming connections from its executors throughout its lifetime (e.g., see spark.driver.port in the network config section). As such, the driver program must be network addressable from the worker nodes.
Because the driver schedules tasks on the cluster, it should be run close to the worker nodes, preferably on the same local area network. If you’d like to send requests to the cluster remotely, it’s better to open an RPC to the driver and have it submit operations from nearby than to run a driver far away from the worker nodes.

关于这个架构，有以下几个有用的地方需要注意：

每个应用程序application都有属于它自己本身的executor进程，这些进程横跨这个application的整个生命周期并且以多线程的方式来执行内部的多个tasks。这有个好处，每个application之间无论是在调度层面scheduling side还是在执行层面executor side都是相互隔离的。也就是说从调度层面来看，每个driver调度属于它自身的tasks，从执行层面上来看，属于不同applications的tasks运行在不同的JVM上。然而，这也意味着不同的Spark applications（也可以说是SparkContext的实例）是不能共享各自所属的数据，除非，你把数据写到外部存储系统，比如说Alluxio。
Spark不关心底层的cluster manager是哪种类型。只要Spark可以获取得到executor进程，并且这些executor进程能够互相通信，那么对于同样支持其他applications的cluster manager来说，比如Mesos/YARN，都是能去运行Spark程序的。
在driver program的整个生命周期中，它一直在监听并且接收来自属于它本身的executors的连接。可以查看spark.driver.port in the network config section。因此，driver program必须跟各个worker nodes节点网络互通。
由于driver是在集群上调度各个任务的，按理来说，它应该运行在靠近worker nodes的节点上，最好是在同一个局域网里。如果你想发送请求给远端的集群，最佳的方式是你给driver开一个RPC，并且让driver在靠近worker nodes的节点上提交作业，而不是在远离worker nodes的节点上运行driver。

Cluster Manager Types

The system currently supports several cluster managers:

Standalone – a simple cluster manager included with Spark that makes it easy to set up a cluster.
Apache Mesos – a general cluster manager that can also run Hadoop MapReduce and service applications.
Hadoop YARN – the resource manager in Hadoop 2.
Kubernetes – an open-source system for automating deployment, scaling, and management of containerized applications.

A third-party project (not supported by the Spark project) exists to add support for Nomad as a cluster manager.

Cluster Manager类型

如今Spark生态目前支持以下几个cluster managers。

Standalone – 一个简单的cluster manager，它内置了Spark，使得我们能够快速的启动一个集群。
Apache Mesos – 一个通用的cluster manager，它能够运行Hadoop MapReduce以及service applications。
Hadoop YARN – Hadoop 2的resource manager。
Kubernetes – 一个开源的项目，用于自动部署，扩容以及容器内部应用的管理。

Submitting Applications

Applications can be submitted to a cluster of any type using the spark-submit script. The application submission guide describes how to do this.

提交Applications

通过spark-submit脚本，我们可以把应用Applications提交到任何类型的集群上。这篇 application submission guide 文章描述了具体的实现方式。

Monitoring

Each driver program has a web UI, typically on port 4040, that displays information about running tasks, executors, and storage usage. Simply go to http://<driver-node>:4040 in a web browser to access this UI. The monitoring guide also describes other monitoring options.

监控

每个driver program都有它自己的一套web UI界面，通常运行在4040端口，它详细展示了当前运行的tasks，executors，和存储使用情况等相关信息。我们可以通过浏览器访问http://<driver-node>:4040来浏览这个UI界面。这篇monitoring guide文章详细介绍了其他监控选项。

Job Scheduling

Spark gives control over resource allocation both across applications (at the level of the cluster manager) and within applications (if multiple computations are happening on the same SparkContext). The job scheduling overview describes this in more detail.

作业调度

Spark不仅仅通过cluster manager在各个应用applications之间来控制资源的分配，而且在应用applications内部，当同一个SparkContext里面同时有多个计算同时运行的时候，Spark同样会去控制资源如何分配。详情请看job scheduling overview。

Glossary

The following table summarizes terms you’ll see used to refer to cluster concepts:

Term	Meaning
Application	User program built on Spark. Consists of a driver program and executors on the cluster.
Application jar	A jar containing the user's Spark application. In some cases users will want to create an "uber jar" containing their application along with its dependencies. The user's jar should never include Hadoop or Spark libraries, however, these will be added at runtime.
Driver program	The process running the main() function of the application and creating the SparkContext
Cluster manager	An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
Deploy mode	Distinguishes where the driver process runs. In "cluster" mode, the framework launches the driver inside of the cluster. In "client" mode, the submitter launches the driver outside of the cluster.
Worker node	Any node that can run application code in the cluster
Executor	A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
Task	A unit of work that will be sent to one executor
Job	A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. `save`, `collect`); you'll see this term used in the driver's logs.
Stage	Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs.

术语

下面的表格总结了一些你可能会碰到的关于集群的概念和术语。

Term	Meaning
Application	通过Spark构建的用户应用程序。包括集群中的一个driver program以及多个executors。
Application jar	一个jar包包括用户的Spark应用程序。在某些情况之下，一些用户想要创建一个"uber jar"，这个"uber jar"包含了自身的应用程序以及自身所有的依赖。用户的jar包里面不能包括Hadoop or Spark的库，但是，这个功能很快就会被添加。
Driver program	Driver program是一个进程，它里面运行着应用程序application里面的main()函数，并在main函数里面创建SparkContext
Cluster manager	一个外部服务，获取standalone manager，Mesos,，YARN等集群上面的资源
Deploy mode	决定了driver进程运行在哪里。如果使用cluster模式，Spark会把driver运行在集群里面。如果使用client模式，submitter会将driver部署到集群外面。
Worker node	一个集群里面能够运行应用程序代码的工作节点
Executor	一个worker node节点上面运行应用程序application的进程，它运行tasks，并且把数据存到内存或者持久化到存储设备。每个应用程序application有它自身相应的executors。
Task	一个发送给executor作业的工作单元。
Job	当一个Spark action（如`save`, `collect`）被触发，一个包含很多个tasks的并行计算的job将会生成。你可以在driver's logs看到这个术语。
Stage	每个job将会被分成更小的tasks的集合，我们称之为stages。每个stage之间相互依赖，类似于MapReduce的map和reduce的stages。你可以在driver's logs看到这个术语。

Spark Cluster Mode Overview 翻译