Hadoop开发问题汇总

1、hdfs复制和移动

参考博客

进行整个目录的复制或者移动

hadoop fs -cp -f 待复制文件路径 目标文件路径

2、Container killed by YARN for exceeding memory limits. 10.4 GB of 10.4 GB physical memory used

spark.yarn.executor.memoryOverhead = 8192

参考博客

3、Container ### is running beyond physical memory limits. Current usage: 9.0 GB of 9 GB physical memory used; 10.0 GB of 18.9 GB virtual memory used. Killing container.

driver-memory = 8G
executor-memory = 10G
executor-cores = 8

spark.yarn.executor.memoryOverhead = 10240

参考链接
参考链接2

4、org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation

解决方案:将一个rdd进行action转换后,保存在内存中。

或者进行join操作

扫描二维码关注公众号,回复: 10898966 查看本文章

5、spark删除hdfs文件

val hadoopConf = sparkContext.hadoopConfiguration
    val hdfs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
 if(hdfs.exists(path)){
      //为防止误删,禁止递归删除
      hdfs.delete(path,false)
    }

参考链接
参考链接2
参考链接3
参考链接4

6、spark删除HDFS集群中的所有空文件和空目录

主要通过listLocatedStatus函数
参考博客

发布了31 篇原创文章 · 获赞 41 · 访问量 13万+

猜你喜欢

转载自blog.csdn.net/circle2015/article/details/102769122