hadoop自带的wordcount应用:
>1)本地文件hello.txt
cat hello.txt
2)复制文件:
cp hello.txt hello2.txt
3)在远程 创建d3文件夹
hadoop@Master:/usr/local/hadoop/share/hadoop/mapreduce$ hadoop fs -mkdir /user/hadoop/d3
4)分别将本地的hello.txt文件 上传到远程hdfs
hadoop@Master:/usr/local/hadoop/share/hadoop/mapreduce$ hadoop fs -put hello.txt /user/hadoop/d3
hadoop@Master:/usr/local/hadoop/share/hadoop/mapreduce$ hadoop fs -put hello2.txt /user/hadoop/d3
5)执行任务 ,统计hadoop下d3文件夹下的文本文件单词:
>hadoop jar hadoop-mapreduce-examples-2.8.0.jar wordcount /user/hadoop/d3 /user/hadoop/d3output
这个过程会有mapreduce执行的过程,输出很多info信息,时间从几十秒到几分钟,看统计任务的大小
6)查看输出文件夹,有Success文件表示成功
>hadoop fs -ls /user/hadoop/d3output
7)显示输出文件中内容结果:
>hadoop fs -cat /user/hadoop/d3output/part-r-00000
自定义的WordCount过程:
![](https://img-blog.csdn.net/20180502182738786?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
![](https://img-blog.csdn.net/2018050218282566?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
![](https://img-blog.csdn.net/20180502183014434?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
![](https://img-blog.csdn.net/20180502183503714?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
运行完成后,将整个项目导出export成 jar file
(我导出的路径是本地的hadoop_file文件夹中的MyWordCount.jar)
在hadoop中的/user/hadoop/d3文件夹中放入了两个文档如下:
![](https://img-blog.csdn.net/20180502183542700?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
在导出的路径下执行:hadoop jar MyWordCount.jar MyWordCount 输入文件 输出文件
![](https://img-blog.csdn.net/20180502183659819?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
![](https://img-blog.csdn.net/20180502183731769?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
查看输出文件是否成功输出内容:
![](https://img-blog.csdn.net/20180502183942341?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
最后执行hadoop fs -cat /user/hadoop/output0502单词查看统计结果:
![](https://img-blog.csdn.net/20180502184426399?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
![](https://img-blog.csdn.net/2018050218462512?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3d3dzY2Nl8=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
>1)本地文件hello.txt
cat hello.txt
2)复制文件:
cp hello.txt hello2.txt
3)在远程 创建d3文件夹
hadoop@Master:/usr/local/hadoop/share/hadoop/mapreduce$ hadoop fs -mkdir /user/hadoop/d3
4)分别将本地的hello.txt文件 上传到远程hdfs
hadoop@Master:/usr/local/hadoop/share/hadoop/mapreduce$ hadoop fs -put hello.txt /user/hadoop/d3
hadoop@Master:/usr/local/hadoop/share/hadoop/mapreduce$ hadoop fs -put hello2.txt /user/hadoop/d3
5)执行任务 ,统计hadoop下d3文件夹下的文本文件单词:
>hadoop jar hadoop-mapreduce-examples-2.8.0.jar wordcount /user/hadoop/d3 /user/hadoop/d3output
这个过程会有mapreduce执行的过程,输出很多info信息,时间从几十秒到几分钟,看统计任务的大小
6)查看输出文件夹,有Success文件表示成功
>hadoop fs -ls /user/hadoop/d3output
7)显示输出文件中内容结果:
>hadoop fs -cat /user/hadoop/d3output/part-r-00000
自定义的WordCount过程:
运行完成后,将整个项目导出export成 jar file
(我导出的路径是本地的hadoop_file文件夹中的MyWordCount.jar)
在hadoop中的/user/hadoop/d3文件夹中放入了两个文档如下:
在导出的路径下执行:hadoop jar MyWordCount.jar MyWordCount 输入文件 输出文件
查看输出文件是否成功输出内容:
最后执行hadoop fs -cat /user/hadoop/output0502单词查看统计结果: