Spark提交任务的相关参数解析

Options:


--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.

配置spark运行的模式

  • local local[k] 本地模式
    • 使用一/多个worker线程提交任务,适合小量数据在本地调试代码
  • spark://host:port Standalone模式
    • spark自带的机器模式,配置比较麻烦,一般用于入门学习,不作为生产环境
  • yarn yarn模式
    • 使用yarn作为spark第哦啊度任务的框架,有yarn-client、 yarn-client,生产最常用的一项
  • mesos mesos模式
    • Mesos是Apache下的开源分布式资源管理框架,和yarn相似

--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or

on one of the worker machines inside the cluster ("cluster")

(Default: client).

master参数必须选择yarn

  • client yarn-client模式
    • yarn-client模式使用yarn调度资源运行任务,Driver运行在本地,所在服务器网络压力较大,好处是日志在本地打印方便调试,适合测试
  • cluster yarn-cluster模式
    • yarn-cluster模式式使用yarn调度资源运行任务,Driver运行在NodeManager上,每次运行都时随机分配到NodeManager机器上,适合生产环境

--class CLASS_NAME Your application's main class (for Java / Scala apps).

要运行的main方法,--class com.report.AttackDetailReport


--name NAME A name of your application.

任务的名称,使用yarn-client模式提交,appname是代码里设置的,yarn-cluster模式提交,appname变为执行类的全类名,例如com.aa.bb.Main


--jars JARS Comma-separated list of local jars to include on the driver

and executor classpaths.

任务要依赖的jar报的路径(本地),多个路径使用英文逗号隔开,--jar /opt/c.jar,/opt/d.jar


--packages Comma-separated list of maven coordinates of jars to include

on the driver and executor classpaths. Will search the local

maven repo, then maven central and any additional remote

repositories given by --repositories. The format for the

coordinates should be groupId:artifactId:version.

要依赖的jar包的maven地址,--repositories 为mysql-connector-java包的maven地址,若不给定,则会使用该机器安装的maven默认源中下载


--exclude-packages Comma-separated list of groupId:artifactId, to exclude while

resolving the dependencies provided in --packages to avoid

dependency conflicts.

排除怕冲突的maven依赖


--repositories Comma-separated list of additional remote repositories to

search for the maven coordinates given with --packages.

maven地址,多个地址英文逗号分隔


--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place

on the PYTHONPATH for Python apps.

需要加载的外部文件,python版本,多个地址英文逗号分隔


--files FILES Comma-separated list of files to be placed in the working

directory of each executor.

需要加载的外部文件,java、scala版本,多个地址英文逗号分隔


--conf PROP=VALUE Arbitrary Spark configuration property.

以kv的形式往spark configuration里面传参数

例子:打印driver的gc信息 --conf "spark.driver.extraJavaOptions= -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"


--properties-file FILE Path to a file from which to load extra properties. If not

specified, this will look for conf/spark-defaults.conf.

在--properties-file中定义的属性就不必要在spark-sumbit中再定义了,比如在conf/spark-defaults.conf 定义了spark.master,就可以不使用--master了


--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M).

spark driver的内存(比如 1000M, 2G)默认1GB,yarn-client、yarn-cluster下也是1G,如果yarn-cluster模式下container内存不足退出,考虑是否是dirver内存不足


--driver-java-options Extra Java options to pass to the driver.

添加java的参数


--driver-library-path Extra library path entries to pass to the driver.

添加java的包


--driver-class-path Extra class path entries to pass to the driver. Note that

jars added with --jars are automatically included in the

classpath.

添加依赖的驱动,常在使用到mysql的时候,添加mysql的连接包,--driver-class-path /opt/gttx/spark/task/lib/mysql-connector-java-5.1.47.jar


--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

一个executor的内存大小,默认1G,yarn模式下executor是跑在container中的,而yarn的container能申请到的最大内存是可配置有上限的


--proxy-user NAME User to impersonate when submitting the application.

模拟提交应用程序的用户


--help, -h Show this help message and exit

帮助信息


--verbose, -v Print additional debug output

生成更详细的运行信息


--version, Print the version of current Spark

打印版本信息

Spark standalone with cluster deploy mode only:


--driver-cores NUM Cores for driver (Default: 1).

Spark standalone or Mesos with cluster deploy mode only:

这个参数仅仅在standalone集群deploy模式下使用,Driver的核数,默认是1


--supervise If given, restarts the driver on failure.

Driver失败时,重启driver,在mesos或者standalone下使用


--kill SUBMISSION_ID If given, kills the driver specified.

杀掉driver进程


--status SUBMISSION_ID If given, requests the status of the driver specified.

查看driver进程

Spark standalone and Mesos only:


--total-executor-cores NUM Total cores for all executors.

所有executor总共的核数,仅仅在mesos或者standalone下使用

Spark standalone and YARN only:


--executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,

or all available cores on the worker in standalone mode)

每个excutor的核数,仅仅在yarn或者standalone下使用,默认1核

YARN-only:


--driver-cores NUM Number of cores used by the driver, only in cluster mode

(Default: 1).

driver核数,默认1核,仅yarn模式


--queue QUEUE_NAME The YARN queue to submit to (Default: "default").

在spark队列,仅yarn模式


--num-executors NUM Number of executors to launch (Default: 2).

启动executors的数量,默认为2,仅yarn模式


--archives ARCHIVES Comma separated list of archives to be extracted into the

working directory of each executor.

逗号分隔的归档文件列表,会被解压到每个Executor的工作目录中,仅yarn模式


--principal PRINCIPAL Principal to be used to login to KDC, while running on

secure HDFS.

安全相关,仅yarn模式


--keytab KEYTAB The full path to the file that contains the keytab for the

principal specified above. This keytab will be copied to

the node running the Application Master via the Secure

Distributed Cache, for renewing the login tickets and the

delegation tokens periodically.

安全相关,仅yarn模式

发布了9 篇原创文章 · 获赞 1 · 访问量 3403

猜你喜欢

转载自blog.csdn.net/y1006597541/article/details/103298806
今日推荐