Spark task default boot eat memory solution

This article addresses the Spark yarn-cluster mode operation, the problem of insufficient memory.

Spark yarn-cluster mode, the note yarn.app.mapreduce.am.resource.mb settings. Default is 1G

Spark On YARN memory allocation


In this paper, Understanding Memory allocation under the Spark On YARN deployment model, because there is no in-depth study of Spark source code, so only in accordance with the relevant logs to see the source code to understand "why this is so, why it was like."

Explanation

Spark distributed manner according to the different driver application, Spark on YARN has two modes: yarn-client mode, yarn-cluster mode.

When running on the Spark YARN operations, each operating as a Spark executor YARN container. Spark Tasks can run such that a plurality of the same inside the container.

The figure is the job executes yarn-cluster mode, images from the network:

About Spark On YARN related configuration parameters, please refer to the Spark configuration parameters . This article focuses on memory allocation, so only need to focus on the following parameters related to heart:

spark.driver.memory: 512m default value

spark.executor.memory: 512m default value

spark.yarn.am.memory: 512m default value

spark.yarn.executor.memoryOverhead:值为executorMemory * 0.07, with minimum of 384

spark.yarn.driver.memoryOverhead:值为driverMemory * 0.07, with minimum of 384

spark.yarn.am.memoryOverhead:值为AM memory * 0.07, with minimum of 384

note:

--executor-memory / spark.executor.memory executor control the size of the stack, but the JVM itself will take up some heap space, or directly such as the inside of the String byte buffer, spark.yarn.XXX.memoryOverhead attribute decision to the request YARN each executor or additionally or heap size dirver am, the default value max (384, 0.07 * spark.executor.memory)

Configuration when executor execution of excessive memory often leads to long delays GC, 64G is the upper limit of the executor memory size of a recommendation.

HDFS client performance problems when large numbers of concurrent threads. Rough estimate is that each executor up to five concurrent task can fill the write bandwidth.

In addition, because the task is to be submitted to the YARN running, so YARN There are several key parameters, reference YARN memory and CPU configuration :

yarn.app.mapreduce.am.resource.mb: AM able to apply the maximum memory, the default is 1536MB

yarn.nodemanager.resource.memory-mb: nodemanager able to apply the maximum memory, the default is 8192MB

yarn.scheduler.minimum-allocation-mb: a minimum resource scheduling a container able to apply, the default value 1024MB

yarn.scheduler.maximum-allocation-mb: a maximum resource scheduling a container able to apply, the default value 8192MB

test

Spark cluster test environment:

master: 64G memory, 16-core cpu

worker: 128G memory, 32-core cpu

worker: 128G memory, 32-core cpu

worker: 128G memory, 32-core cpu

worker: 128G memory, 32-core cpu

Note: Spark YARN disposed over the cluster of the cluster, on each of a worker nodes are deployed the NodeManager, and YARN cluster is as follows:

yarn.nodemanager.resource.memory-mb106496yarn.scheduler.minimum-allocation-mb2048yarn.scheduler.maximum-allocation-mb106496yarn.app.mapreduce.am.resource.mb2048</property>

The spark log basic tone is DEBUG, WARN and log4j.logger.org.apache.hadoop to build unnecessary output, modify /etc/spark/conf/log4j.properties:

# Set everything to be logged to the consolelog4j.rootCategory=DEBUG, consolelog4j.appender.console=org.apache.log4j.ConsoleAppenderlog4j.appender.console.target=System.errlog4j.appender.console.layout=org.apache.log4j.PatternLayoutlog4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n# Settings to quiet third party logs that are too verboselog4j.logger.org.eclipse.jetty=WARNlog4j.logger.org.apache.hadoop=WARNlog4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERRORlog4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFOlog4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Next is to run the test program, the official comes to SparkPi example, for example, the following main test client modes, the cluster model, please refer to the following procedure. Run the following command:

spark-submit --class org.apache.spark.examples.SparkPi\    --master yarn-client\    --num-executors 4\    --driver-memory 2g\    --executor-memory 3g\    --executor-cores 4\    /usr/lib/spark/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar\

    100000

Observation output log (log is irrelevant omitted):

15/06/08 13:57:01 INFO SparkContext: Running Spark version 1.3.0

15/06/08 13:57:02 INFO SecurityManager: Changing view acls to: root

15/06/08 13:57:02 INFO SecurityManager: Changing modify acls to: root

15/06/08 13:57:03 INFO MemoryStore: MemoryStore started with capacity 1060.3 MB

15/06/08 13:57:04 DEBUG YarnClientSchedulerBackend: ClientArguments called with: --arg bj03-bi-pro-hdpnamenn:51568 --num-executors 4 --num-executors 4 --executor-memory 3g --executor-memory 3g --executor-cores 4 --executor-cores 4 --name Spark Pi

15/06/08 13:57:04 DEBUG YarnClientSchedulerBackend: [actor] handled message (24.52531 ms) ReviveOffers from Actor[akka://sparkDriver/user/CoarseGrainedScheduler#864850679]

15/06/08 13:57:05 INFO Client: Requesting a new application from cluster with 4 NodeManagers

15/06/08 13:57:05 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (106496 MB per container)

15/06/08 13:57:05 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead

15/06/08 13:57:05 INFO Client: Setting up container launch context for our AM

15/06/08 13:57:07 DEBUG Client: ===============================================================================

15/06/08 13:57:07 DEBUG Client: Yarn AM launch context:

15/06/08 13:57:07 DEBUG Client:    user class: N/A

15/06/08 13:57:07 DEBUG Client:    env:

15/06/08 13:57:07 DEBUG Client:        CLASSPATH -> <CPS>/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>:/usr/lib/spark/lib/spark-assembly.jar::/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/*

15/06/08 13:57:07 DEBUG Client:        SPARK_DIST_CLASSPATH -> :/usr/lib/spark/lib/spark-assembly.jar::/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/*

15/06/08 13:57:07 DEBUG Client:        SPARK_YARN_CACHE_FILES_FILE_SIZES -> 97237208

15/06/08 13:57:07 DEBUG Client:        SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1433742899916_0001

15/06/08 13:57:07 DEBUG Client:        SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE

15/06/08 13:57:07 DEBUG Client:        SPARK_USER -> root

15/06/08 13:57:07 DEBUG Client:        SPARK_YARN_MODE -> true

15/06/08 13:57:07 DEBUG Client:        SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1433743027399

15/06/08 13:57:07 DEBUG Client:        SPARK_YARN_CACHE_FILES -> hdfs://mycluster:8020/user/root/.sparkStaging/application_1433742899916_0001/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar#__spark__.jar

15/06/08 13:57:07 DEBUG Client:    resources:

15/06/08 13:57:07 DEBUG Client:        __spark__.jar -> resource { scheme: "hdfs" host: "mycluster" port: 8020 file: "/user/root/.sparkStaging/application_1433742899916_0001/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar" } size: 97237208 timestamp: 1433743027399 type: FILE visibility: PRIVATE

15/06/08 13:57:07 DEBUG Client:    command:

15/06/08 13:57:07 DEBUG Client:        /bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp '-Dspark.eventLog.enabled=true' '-Dspark.executor.instances=4' '-Dspark.executor.memory=3g' '-Dspark.executor.cores=4' '-Dspark.driver.port=51568' '-Dspark.serializer=org.apache.spark.serializer.KryoSerializer' '-Dspark.driver.appUIAddress=http://bj03-bi-pro-hdpnamenn:4040' '-Dspark.executor.id=<driver>' '-Dspark.kryo.classesToRegister=scala.collection.mutable.BitSet,scala.Tuple2,scala.Tuple1,org.apache.spark.mllib.recommendation.Rating' '-Dspark.driver.maxResultSize=8g' '-Dspark.jars=file:/usr/lib/spark/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar' '-Dspark.driver.memory=2g' '-Dspark.eventLog.dir=hdfs://mycluster:8020/user/spark/applicationHistory' '-Dspark.app.name=Spark Pi' '-Dspark.fileserver.uri=http://X.X.X.X:49172' '-Dspark.tachyonStore.folderName=spark-81ae0186-8325-40f2-867b-65ee7c922357' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'bj03-bi-pro-hdpnamenn:51568' --executor-memory 3072m --executor-cores 4 --num-executors  4 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr

15/06/08 13:57:07 DEBUG Client: ===============================================================================

From Will allocate AM container, with 896 MB memory including 384 MB overhead log can be seen, AM takes up 896 MB RAM, 384 MB of overhead to get rid of memory, in fact, only the default 512 MB, namely the spark.yarn.am.memory value, another may see YARN cluster has four NodeManager, each container has up to 106496 MB of memory.

Yarn AM launch context initiated a process of Java, JVM memory settings for the 512m, see / bin / java -server -Xmx512m.

Why does this default value it? To view the print log of above line of code, see org.apache.spark.deploy.yarn.Client:

privatedefverifyClusterResources(newAppResponse:GetNewApplicationResponse):Unit={valmaxMem=newAppResponse.getMaximumResourceCapability().getMemory()logInfo("Verifying our application has not requested more than the maximum "+s"memory capability of the cluster ($maxMem MB per container)")valexecutorMem=args.executorMemory+executorMemoryOverheadif(executorMem>maxMem){thrownewIllegalArgumentException(s"Required executor memory (${args.executorMemory}"+s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster!")}valamMem=args.amMemory+amMemoryOverheadif(amMem>maxMem){thrownewIllegalArgumentException(s"Required AM memory (${args.amMemory}"+s"+$amMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster!")}logInfo("Will allocate AM container, with %d MB memory including %d MB overhead".format(amMem,amMemoryOverhead))}

args.amMemory from ClientArguments class, will be checked out parameters:

privatedefvalidateArgs():Unit={if(numExecutors<=0){thrownewIllegalArgumentException("You must specify at least 1 executor!\n"+getUsageMessage())}if(executorCoresamMemory=mem}sparkConf.getOption(amCoresKey).map(_.toInt).foreach{cores=>amCores=cores}}}

From the above code can be seen when isClusterMode is true, the value driverMemory args.amMemory value; otherwise, from spark.yarn.am.memory fetch, if the property is not set, the default value 512m. isClusterMode userClass conditions is true not empty, def isClusterMode:! Boolean = userClass = null, i.e., output parameters need --class parameters, the log can be seen from the following output parameters of ClientArguments and no parameters.

15/06/08 13:57:04 DEBUG YarnClientSchedulerBackend: ClientArguments called with: --arg bj03-bi-pro-hdpnamenn:51568 --num-executors 4 --num-executors 4 --executor-memory 3g --executor-memory 3g --executor-cores 4 --executor-cores 4 --name Spark Pi

Therefore, in order to set the application memory AM value, using either cluster mode or in the client mode is manually set spark.yarn.am.memory --conf properties, for example:

spark-submit --class org.apache.spark.examples.SparkPi\    --master yarn-client\    --num-executors 4\    --driver-memory 2g\    --executor-memory 3g\    --executor-cores 4\    --conf spark.yarn.am.memory=1024m\    /usr/lib/spark/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar\

    100000

Open YARN management interface, you can see:

a. Spark Pi five Container application start, using a 18G memory, CPU core 5

b. YARN AM to start a Container, take up memory to 2048M

c. YARN started 4 Container running tasks, each Container take up memory to 4096M

Why is 2G + 4G * 4 = 18G it? First Container only apply for a 2G memory, because our program is only for the AM application memory 512m, while yarn.scheduler.minimum-allocation-mb parameter determines the minimum to apply for 2G memory. As for the rest of Container, we set the executor-memory memory is 3G, how each memory is 4096M Container occupied it?

In order to find out the law, multiple sets of test data were collected and tested for the executor-memory Memory Container filings 3G, 4G, 5G, 6G executor corresponding to each:

executor-memory=3g:2G+4G * 4=18G

executor-memory=4g:2G+6G * 4=26G

executor-memory=5g:2G+6G * 4=26G

executor-memory=6g:2G+8G * 4=34G

On this issue, I view the source code, according to org.apache.spark.deploy.yarn.ApplicationMaster -> YarnRMClient -> YarnAllocator class search path to find YarnAllocator has this piece of code:

// Executor memory in MB.protectedvalexecutorMemory=args.executorMemory// Additional memory overhead.protectedvalmemoryOverhead:Int=sparkConf.getInt("spark.yarn.executor.memoryOverhead",math.max((MEMORY_OVERHEAD_FACTOR*executorMemory).toInt,MEMORY_OVERHEAD_MIN))// Number of cores per executor.protectedvalexecutorCores=args.executorCores// Resource capability requested for each executorsprivatevalresource=Resource.newInstance(executorMemory+memoryOverhead,executorCores)

Because no specific YARN see the source code, so there is speculation that the size of the Container according executorMemory memoryOverhead calculated, probably every rule + Container size must yarn.scheduler.minimum-allocation-mb is an integer multiple of the value when executor-memory = 3g when, executorMemory + memoryOverhead to 3G + 384M = 3456M, Container size is required to apply yarn.scheduler.minimum-allocation-mb * 2 = 4096m = 4G, and so on others.

note:

Yarn always rounds up memory requirement to multiples ofyarn.scheduler.minimum-allocation-mb, which by default is 1024 or 1GB.

Spark adds anoverheadtoSPARK_EXECUTOR_MEMORY/SPARK_DRIVER_MEMORYbefore asking Yarn for the amount.

Also, note that the calculation method memoryOverhead, when executorMemory large value, a value corresponding memoryOverhead becomes large, this time not 384m, the memory value corresponding Container application is also larger, for example: when the set executorMemory when 90G, memoryOverhead value math.max (0.07 * 90G, 384m) = 6.3G, Container memory corresponding application is 98G.

Looking back to the corresponding AM Container 2G memory allocation reasons, 512 + 384 = 896, less than 2G, 2G dispensing it, you can observe again after setting value of spark.yarn.am.memory.

Spark open management interface HTTP: // ip: 4040 , you can see occupancy driver and Executor in memory:

From the map you can see Executor takes up 1566.7 MB of memory, this is how calculated? Reference Spark on Yarn:? Where Have All the Memory Gone this article, totalExecutorMemory is calculated as follows:

//yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scalavalMEMORY_OVERHEAD_FACTOR=0.07valMEMORY_OVERHEAD_MIN=384//yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scalaprotectedvalmemoryOverhead:Int=sparkConf.getInt("spark.yarn.executor.memoryOverhead",math.max((MEMORY_OVERHEAD_FACTOR*executorMemory).toInt,MEMORY_OVERHEAD_MIN))......valtotalExecutorMemory=executorMemory+memoryOverheadnumPendingAllocate.addAndGet(missing)logInfo(s"Will allocate $missing executor containers, each with $totalExecutorMemory MB "+s"memory including $memoryOverhead MB overhead")

Here we give 3G memory executor-memory setting, memoryOverhead value math.max (0.07 * 3072, 384) = 384, the maximum available memory calculated by the following codes:

//core/src/main/scala/org/apache/spark/storage/BlockManager.scala/** Return the total amount of storage memory available. */privatedefgetMaxMemory(conf:SparkConf):Long={valmemoryFraction=conf.getDouble("spark.storage.memoryFraction",0.6)valsafetyFraction=conf.getDouble("spark.storage.safetyFraction",0.9)(Runtime.getRuntime.maxMemory*memoryFraction*safetyFraction).toLong}

That is, when the 3G executor-memory setting, Executor memory footprint of approximately 3072m * 0.6 * 0.9 = 1658.88m of note: in fact should be multiplied by the value Runtime.getRuntime.maxMemory, which is less than 3072m.

Driver takes up the figure above 1060.3 MB, this time value is driver-memory bit 2G, so memory usage is stored in the driver: 2048m * 0.6 * 0.9 = 1105.92m, note: in fact should be multiplied Runtime.getRuntime.maxMemory the value that is less than 2048m.

This time, view the worker node CoarseGrainedExecutorBackend process startup script:

$ jps46841 Worker21894 CoarseGrainedExecutorBackend934521816 ExecutorLauncher4336924300 NodeManager38012 JournalNode36929 QuorumPeerMain22909 Jps$ ps -ef|grep 21894nobody  21894 21892 99 17:28 ?        00:04:49 /usr/java/jdk1.7.0_71/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms3072m -Xmx3072m  -Djava.io.tmpdir=/data/yarn/local/usercache/root/appcache/application_1433742899916_0069/container_1433742899916_0069_01_000003/tmp -Dspark.driver.port=60235 -Dspark.yarn.app.container.log.dir=/data/yarn/logs/application_1433742899916_0069/container_1433742899916_0069_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@bj03-bi-pro-hdpnamenn:60235/user/CoarseGrainedScheduler --executor-id 2 --hostname X.X.X.X --cores 4 --app-id application_1433742899916_0069 --user-class-path file:/data/yarn/local/usercache/root/appcache/application_1433742899916_0069/container_1433742899916_0069_01_000003/__app__.jar

You can see each CoarseGrainedExecutorBackend process to allocate memory for the 3072m, if we want to see jvm operation of each executor, you can open the jmx. Add the following line of code in the /etc/spark/conf/spark-defaults.conf:

spark.executor.extraJavaOptions-Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false

Then, jconsole monitoring jvm heap memory operation, so easy to debug memory size.

to sum up

From the above, in the client mode, AM Container corresponding memory is determined by adding spark.yarn.am.memory spark.yarn.am.memoryOverhead, determine a corresponding value obtained by adding spark.yarn.executor.memoryOverhead Container After the executor memory size required application, driver and executor memory or after adding value spark.yarn.driver.memoryOverhead spark.yarn.executor.memoryOverhead multiplied by 0.54 to determine the memory size of the storage memory. In the YARN, Container allocated memory size must yarn.scheduler.minimum-allocation-mb is an integer multiple.

The picture below shows a Spark on YARN memory structures, images from How-to: Tune the Apache the Spark Your Jobs (Part 2) :

Guess you like

Origin blog.csdn.net/weixin_34129696/article/details/90970908