1. cd $SPARK_HOME/bin
2. master=spark://master.hadoop.zjportdns.gov.cn ./spark-shell
然后就可以运行脚本了
scala> val a = sc.parallelize(1 to 9, 3) a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24 scala> val b = a.map(x => x*2) b: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at <console>:26 scala> a.collect res0: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9) scala> b.collect res1: Array[Int] = Array(2, 4, 6, 8, 10, 12, 14, 16, 18)
3. 可以对hdfs文件进行分析
然后就可以愉快的进行大数据分析了。