01-spark的standalone模式部署(2017-06-19)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_29622761/article/details/73522113

1、克隆一台虚拟机

删除

[root@zbserver ~]# rm -rf /etc/udev/rules.d/*

修改网卡

[root@zbserver ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno16777728
TYPE=Ethernet
BOOTPROTO=static
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=eno16777728
UUID=30f74f98-8f20-4e25-aff0-4a23cf7e911d
DEVICE=eno16777728
ONBOOT=yes
IPADDR=192.168.181.151
GATEWAY=192.168.181.2
NETMASK=255.255.255.0
DNS1=192.168.181.2

重启网卡

systemctl restart network

检查能否ping通外网

[root@zbserver ~]# ping www.baidu.com
PING www.baidu.com (14.215.177.37) 56(84) bytes of data.
64 bytes from 14.215.177.37: icmp_seq=1 ttl=128 time=6.72 ms
64 bytes from 14.215.177.37: icmp_seq=2 ttl=128 time=8.31 ms

从本地上ping

C:\Users\huadongxie>ping 192.168.181.152

正在 Ping 192.168.181.152 具有 32 字节的数据:
来自 192.168.181.152 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.181.152 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.181.152 的回复: 字节=32 时间=2ms TTL=64
来自 192.168.181.152 的回复: 字节=32 时间<1ms TTL=64

192.168.181.152Ping 统计信息:
    数据包: 已发送 = 4,已接收 = 4,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
    最短 = 0ms,最长 = 2ms,平均 = 0ms

C:\Users\huadongxie>

修改hostname

[root@zbserver ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.181.151 spark1.create80.com spark1
192.168.181.152 spark2.create80.com spark2
192.168.181.153 spark3.create80.com spark3


[root@zbserver ~]# cat /etc/hostname
spark1.create80.com

配置时钟同步:略
强制统一时间 date -s “2017-06-19 22:00:00”

关闭防火墙

[root@zbserver ~]# systemctl status firewalld
[root@zbserver ~]# systemctl stop firewalld

关闭selinux

[root@zbserver ~]# cat /etc/selinux/config

2、安装jdk

上传jdk

[root@spark1 soft]# ll
总用量 444548
-rw-r--r-- 1 root root 153530841 619 22:17 jdk-7u80-linux-x64.tar.gz
-rwxr-xr-x 1 root root      9116 411 2016 mysql57-community-release-el7-8.noarch.rpm
-rw-r--r-- 1 root root 289405702 619 22:17 spark-1.6.1-bin-hadoop2.6.tgz
-rw-r--r-- 1 root root  12261556 619 22:17 spark-1.6.1.tgz

[root@spark1 soft]# mkdir /usr/java

[root@spark1 soft]# tar -zxvf  jdk-7u80-linux-x64.tar.gz  -C /usr/java/

配置环境变量

[root@spark1 soft]# vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_80
export PATH=$PATH:$JAVA_HOME/bin
[root@spark1 soft]# source /etc/profile

[root@spark1 soft]# java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

3、安装spark

[root@spark1 soft]# mkdir /opt/spark
[root@spark1 soft]# tar -zxvf spark-1.6.1-bin-hadoop2.6.tgz -C /opt/spark/

修改Spark配置文件(两个配置文件spark-env.sh和slaves)

[root@spark1 spark-1.6.1-bin-hadoop2.6]# cd conf/
[root@spark1 conf]# ll
总用量 36
-rw-r--r-- 1 500 500  987 227 2016 docker.properties.template
-rw-r--r-- 1 500 500 1105 227 2016 fairscheduler.xml.template
-rw-r--r-- 1 500 500 1734 227 2016 log4j.properties.template
-rw-r--r-- 1 500 500 6671 227 2016 metrics.properties.template
-rw-r--r-- 1 500 500  865 227 2016 slaves.template
-rw-r--r-- 1 500 500 1292 227 2016 spark-defaults.conf.template
-rwxr-xr-x 1 500 500 4209 227 2016 spark-env.sh.template
[root@spark1 conf]# cp spark-env.sh.template spark-env.sh
[root@spark1 conf]# vim spark-env.sh

添加

export JAVA_HOME=/usr/java/jdk1.7.0_80
export SPARK_MASTER_IP=192.168.181.151
export SPARK_MASTER_PORT=7077
[root@spark1 conf]# mv slaves.template slaves
[root@spark1 conf]# vim slaves
添加: 
192.168.181.152
192.168.181.153

分发给其他两台机器:

[root@spark1 conf]# scp -r /opt/spark/spark-1.6.1-bin-hadoop2.6 [email protected]:/opt/spark/spark-1.6.1-bin-hadoop2.6/
[root@spark1 conf]# scp -r /opt/spark/spark-1.6.1-bin-hadoop2.6 [email protected]:/opt/spark/spark-1.6.1-bin-hadoop2.6/

4、配置ssh免密码登录

心跳走的不是ssh协议,走的是tcp协议。

[root@spark1 conf]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1a:f8:0b:e3:ee:5e:12:09:86:77:8a:b6:9e:51:84:9c [email protected]
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|..o              |
|.E+..            |
| +.+ o           |
|....+ . S        |
|. o  o o         |
| o  + +          |
|. o. = .         |
| o += .          |
+-----------------+


[root@spark1 conf]# ssh-copy-id 192.168.181.151
The authenticity of host '192.168.181.151 (192.168.181.151)' can't be established.
ECDSA key fingerprint is 23:b0:da:ff:4a:d5:8d:3e:a4:e8:63:e2:cc:ef:4f:10.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.181.151's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '192.168.181.151'"
and check to make sure that only the key(s) you wanted were added.

[root@spark1 conf]# ssh-copy-id 192.168.181.152
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.181.152's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '192.168.181.152'"
and check to make sure that only the key(s) you wanted were added.

[root@spark1 conf]# ssh-copy-id 192.168.181.153
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.181.153's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '192.168.181.153'"
and check to make sure that only the key(s) you wanted were added.

5、启动集群

[root@spark1 bin]# /opt/spark/spark-1.6.1-bin-hadoop2.6/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/spark-1.6.1-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-spark1.create80.com.out
192.168.181.152: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/spark-1.6.1-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-spark2.create80.com.out
192.168.181.153: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/spark-1.6.1-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-spark3.create80.com.out

[root@spark1 bin]# jps
1581 Jps
1516 Master

[root@spark2 spark-1.6.1-bin-hadoop2.6]# jps
1373 Jps
1323 Worker

[root@spark3 opt]# jps
1329 Worker
1379 Jps

这里写图片描述

默认会吃掉1G的内存。

至此安装完成stanalone模式

6、启动spark-shell

[root@spark1 bin]# /opt/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-shell --master spark://192.168.181.151:7077
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
17/06/19 23:39:38 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
17/06/19 23:39:41 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
17/06/19 23:39:57 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/06/19 23:39:58 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/06/19 23:40:05 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
17/06/19 23:40:06 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
17/06/19 23:40:20 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/06/19 23:40:21 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.

scala>

注意:
如果启动spark shell时没有指定master地址,但是也可以正常启动spark shell和执行spark shell中的程序,其实是启动了spark的local模式,该模式仅在本机启动一个进程,没有与集群建立联系。

Spark Shell中已经默认将SparkContext类初始化为对象sc。用户代码如果需要用到,则直接应用sc即可

这里写图片描述

7、蒙特卡洛求Pi

[root@spark1 bin]# /opt/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.181.151:7077 /opt/spark/spark-1.6.1-bin-hadoop2.6/lib/spark-examples-1.6.1-hadoop2.6.0.jar 100

这里写图片描述
worldcount试验:

scala> val textFile = sc.textFile("/opt/spark/spark-1.6.1-bin-hadoop2.6/README.md")
textFile: org.apache.spark.rdd.RDD[String] = /opt/spark/spark-1.6.1-bin-hadoop2.6/README.md MapPartitionsRDD[5] at textFile at <console>:27

scala> val wordCounts = textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey((a,b)=>a+b)
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:29

scala> wordCounts.collect()
res1: Array[(String, Int)] = Array((package,1), (this,1), (Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1), (Because,1), (Python,2), (cluster.,1), ([run,1), (its,1), (YARN,,1), (have,1), (general,2), (pre-built,1), (locally,2), (locally.,1), (changed,1), (sc.parallelize(1,1), (only,1), (Configuration,1), (This,2), (first,1), (basic,1), (documentation,3), (learning,,1), (graph,1), (Hive,2), (several,1), (["Specifying,1), ("yarn",1), (page](http://spark.apache.org/documentation.html),1), ([params]`.,1), ([project,2), (prefer,1), (SparkPi,2), (<http://spark.apache.org/>,1), (engine,1), (version,1), (file,1), (documentation,,1), (MASTER,1), (example,3), (are,1), (systems.,1), (params,1), (scala>,1), (DataFrames,,1), (provides,1), (refer,2)...
scala>

8、worldcount打包实验


package cn.itcast.akka

import org.apache.spark.{SparkConf, SparkContext}

/**
  * Created by huadongxie on 2017/6/20.
  */
object WordCount {
  def main(args: Array[String]): Unit = {
    //创建SparkConf()并设置App名称
    val conf = new SparkConf().setAppName("WC")
    //创建SparkContext,该对象是提交spark App的入口
    val sc = new SparkContext(conf)
    //使用sc创建RDD并执行相应的transformation和action
    sc.textFile(args(0)).flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_, 1).sortBy(_._2, false).saveAsTextFile(args(1))
    //停止sc,结束该任务
    sc.stop()

  }

}

打包:
修改pom.xml文件的mainClass属性
这里写图片描述

点击Lifecycle,选择clean和package,然后点击Run Maven Build
这里写图片描述

选择编译成功的jar包,并将该jar上传到Spark集群中的某个节点上

这里写图片描述

使用spark-submit命令提交Spark应用(注意参数的顺序)

[root@spark1 ~]# /opt/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit --class cn.itcast.akka.WordCount --master spark://192.168.181.151:7077 /opt/soft/MyFirstRpc-2.0.jar /opt/spark/spark-1.6.1-bin-hadoop2.6/README.md /opt/soft/out/
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/06/20 02:20:41 INFO SparkContext: Running Spark version 1.6.1
17/06/20 02:20:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/20 02:20:43 INFO SecurityManager: Changing view acls to: root
17/06/20 02:20:43 INFO SecurityManager: Changing modify acls to: root
17/06/20 02:20:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/06/20 02:20:45 INFO Utils: Successfully started service 'sparkDriver' on port 39656.
17/06/20 02:20:47 INFO Slf4jLogger: Slf4jLogger started
17/06/20 02:20:47 INFO Remoting: Starting remoting
17/06/20 02:20:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.181.151:51270]
17/06/20 02:20:47 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 51270.
17/06/20 02:20:48 INFO SparkEnv: Registering MapOutputTracker
17/06/20 02:20:48 INFO SparkEnv: Registering BlockManagerMaster
17/06/20 02:20:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-3d084923-ce7f-4c0a-b984-75e4589bc45d
17/06/20 02:20:48 INFO MemoryStore: MemoryStore started with capacity 517.4 MB
17/06/20 02:20:48 INFO SparkEnv: Registering OutputCommitCoordinator
17/06/20 02:20:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/06/20 02:20:59 INFO SparkUI: Started SparkUI at http://192.168.181.151:4040
17/06/20 02:20:59 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7f03ed90-9829-47e4-b2e6-2f3e84b9003b/httpd-016b18b4-9c91-4e3e-93db-29b47d5f2c6b
17/06/20 02:20:59 INFO HttpServer: Starting HTTP Server
17/06/20 02:20:59 INFO Utils: Successfully started service 'HTTP file server' on port 43423.
17/06/20 02:21:01 INFO SparkContext: Added JAR file:/opt/soft/MyFirstRpc-2.0.jar at http://192.168.181.151:43423/jars/MyFirstRpc-2.0.jar with timestamp 1497896461009
17/06/20 02:21:02 INFO AppClient$ClientEndpoint: Connecting to master spark://192.168.181.151:7077...
17/06/20 02:21:03 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170620022103-0003
17/06/20 02:21:03 INFO AppClient$ClientEndpoint: Executor added: app-20170620022103-0003/0 on worker-20170619235617-192.168.181.152-49814 (192.168.181.152:49814) with 1 cores
17/06/20 02:21:03 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170620022103-0003/0 on hostPort 192.168.181.152:49814 with 1 cores, 1024.0 MB RAM
17/06/20 02:21:03 INFO AppClient$ClientEndpoint: Executor added: app-20170620022103-0003/1 on worker-20170619235616-192.168.181.153-37195 (192.168.181.153:37195) with 1 cores
17/06/20 02:21:03 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170620022103-0003/1 on hostPort 192.168.181.153:37195 with 1 cores, 1024.0 MB RAM
17/06/20 02:21:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49298.
17/06/20 02:21:03 INFO NettyBlockTransferService: Server created on 49298
17/06/20 02:21:03 INFO BlockManagerMaster: Trying to register BlockManager
17/06/20 02:21:03 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.181.151:49298 with 517.4 MB RAM, BlockManagerId(driver, 192.168.181.151, 49298)
17/06/20 02:21:03 INFO BlockManagerMaster: Registered BlockManager
17/06/20 02:21:03 INFO AppClient$ClientEndpoint: Executor updated: app-20170620022103-0003/0 is now RUNNING
17/06/20 02:21:03 INFO AppClient$ClientEndpoint: Executor updated: app-20170620022103-0003/1 is now RUNNING
17/06/20 02:21:04 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/06/20 02:21:08 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 153.6 KB)
17/06/20 02:21:08 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 167.5 KB)
17/06/20 02:21:08 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.181.151:49298 (size: 13.9 KB, free: 517.4 MB)
17/06/20 02:21:08 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:15
17/06/20 02:21:11 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
17/06/20 02:21:11 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
17/06/20 02:21:11 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
17/06/20 02:21:11 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
17/06/20 02:21:11 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
17/06/20 02:21:11 INFO SparkContext: Starting job: saveAsTextFile at WordCount.scala:15
17/06/20 02:21:11 INFO FileInputFormat: Total input paths to process : 1
17/06/20 02:21:12 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:15)
17/06/20 02:21:12 INFO DAGScheduler: Registering RDD 5 (sortBy at WordCount.scala:15)
17/06/20 02:21:12 INFO DAGScheduler: Got job 0 (saveAsTextFile at WordCount.scala:15) with 1 output partitions
17/06/20 02:21:12 INFO DAGScheduler: Final stage: ResultStage 2 (saveAsTextFile at WordCount.scala:15)
17/06/20 02:21:12 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
17/06/20 02:21:12 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 1)
17/06/20 02:21:12 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:15), which has no missing parents
17/06/20 02:21:12 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.1 KB, free 171.6 KB)
17/06/20 02:21:12 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 173.9 KB)
17/06/20 02:21:12 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.181.151:49298 (size: 2.3 KB, free: 517.4 MB)
17/06/20 02:21:12 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/06/20 02:21:12 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:15)
17/06/20 02:21:12 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
17/06/20 02:21:16 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (spark3.create80.com:55264) with ID 1
17/06/20 02:21:16 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, spark3.create80.com, partition 0,PROCESS_LOCAL, 2204 bytes)
17/06/20 02:21:16 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (spark2.create80.com:54844) with ID 0
17/06/20 02:21:16 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, spark2.create80.com, partition 1,PROCESS_LOCAL, 2204 bytes)
17/06/20 02:21:16 INFO BlockManagerMasterEndpoint: Registering block manager spark3.create80.com:57493 with 517.4 MB RAM, BlockManagerId(1, spark3.create80.com, 57493)
17/06/20 02:21:16 INFO BlockManagerMasterEndpoint: Registering block manager spark2.create80.com:53418 with 517.4 MB RAM, BlockManagerId(0, spark2.create80.com, 53418)
17/06/20 02:22:11 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark3.create80.com:57493 (size: 2.3 KB, free: 517.4 MB)
17/06/20 02:22:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark3.create80.com:57493 (size: 13.9 KB, free: 517.4 MB)
17/06/20 02:22:15 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark2.create80.com:53418 (size: 2.3 KB, free: 517.4 MB)
17/06/20 02:22:16 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark2.create80.com:53418 (size: 13.9 KB, free: 517.4 MB)
17/06/20 02:22:16 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 60114 ms on spark3.create80.com (1/2)
17/06/20 02:22:18 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 61845 ms on spark2.create80.com (2/2)
17/06/20 02:22:18 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:15) finished in 65.724 s
17/06/20 02:22:18 INFO DAGScheduler: looking for newly runnable stages
17/06/20 02:22:18 INFO DAGScheduler: running: Set()
17/06/20 02:22:18 INFO DAGScheduler: waiting: Set(ShuffleMapStage 1, ResultStage 2)
17/06/20 02:22:18 INFO DAGScheduler: failed: Set()
17/06/20 02:22:18 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at sortBy at WordCount.scala:15), which has no missing parents
17/06/20 02:22:18 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/06/20 02:22:18 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.5 KB, free 177.5 KB)
17/06/20 02:22:18 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.0 KB, free 179.5 KB)
17/06/20 02:22:18 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.181.151:49298 (size: 2.0 KB, free: 517.4 MB)
17/06/20 02:22:18 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
17/06/20 02:22:18 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at sortBy at WordCount.scala:15)
17/06/20 02:22:18 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/06/20 02:22:18 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, spark2.create80.com, partition 0,NODE_LOCAL, 1945 bytes)
17/06/20 02:22:18 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on spark2.create80.com:53418 (size: 2.0 KB, free: 517.4 MB)
17/06/20 02:22:18 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to spark2.create80.com:54844
17/06/20 02:22:18 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 164 bytes
17/06/20 02:22:21 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 3526 ms on spark2.create80.com (1/1)
17/06/20 02:22:21 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/06/20 02:22:21 INFO DAGScheduler: ShuffleMapStage 1 (sortBy at WordCount.scala:15) finished in 3.531 s
17/06/20 02:22:21 INFO DAGScheduler: looking for newly runnable stages
17/06/20 02:22:21 INFO DAGScheduler: running: Set()
17/06/20 02:22:21 INFO DAGScheduler: waiting: Set(ResultStage 2)
17/06/20 02:22:21 INFO DAGScheduler: failed: Set()
17/06/20 02:22:21 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[8] at saveAsTextFile at WordCount.scala:15), which has no missing parents
17/06/20 02:22:21 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 64.9 KB, free 244.4 KB)
17/06/20 02:22:21 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 22.5 KB, free 266.9 KB)
17/06/20 02:22:21 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.181.151:49298 (size: 22.5 KB, free: 517.4 MB)
17/06/20 02:22:21 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
17/06/20 02:22:21 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[8] at saveAsTextFile at WordCount.scala:15)
17/06/20 02:22:21 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
17/06/20 02:22:21 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 3, spark2.create80.com, partition 0,NODE_LOCAL, 1956 bytes)
17/06/20 02:22:21 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on spark2.create80.com:53418 (size: 22.5 KB, free: 517.4 MB)
17/06/20 02:22:28 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to spark2.create80.com:54844
17/06/20 02:22:28 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 145 bytes
17/06/20 02:22:28 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 3) in 6994 ms on spark2.create80.com (1/1)
17/06/20 02:22:28 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
17/06/20 02:22:28 INFO DAGScheduler: ResultStage 2 (saveAsTextFile at WordCount.scala:15) finished in 6.998 s
17/06/20 02:22:28 INFO DAGScheduler: Job 0 finished: saveAsTextFile at WordCount.scala:15, took 77.542055 s
17/06/20 02:22:29 INFO SparkUI: Stopped Spark web UI at http://192.168.181.151:4040
17/06/20 02:22:29 INFO SparkDeploySchedulerBackend: Shutting down all executors
17/06/20 02:22:29 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
17/06/20 02:22:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/06/20 02:22:29 INFO MemoryStore: MemoryStore cleared
17/06/20 02:22:29 INFO BlockManager: BlockManager stopped
17/06/20 02:22:29 INFO BlockManagerMaster: BlockManagerMaster stopped
17/06/20 02:22:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/06/20 02:22:29 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/06/20 02:22:29 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/06/20 02:22:29 INFO SparkContext: Successfully stopped SparkContext
17/06/20 02:22:29 INFO ShutdownHookManager: Shutdown hook called
17/06/20 02:22:29 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f03ed90-9829-47e4-b2e6-2f3e84b9003b/httpd-016b18b4-9c91-4e3e-93db-29b47d5f2c6b
17/06/20 02:22:29 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f03ed90-9829-47e4-b2e6-2f3e84b9003b
[root@spark1 ~]#

猜你喜欢

转载自blog.csdn.net/qq_29622761/article/details/73522113