hadoop wordcount 例子
1:
[hadoop@localhost bin]$ ./hadoop fs -mkdir -p /wordcount/input
19/03/10 21:05:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2:
[hadoop@localhost bin]$ ./hadoop fs -put ~/data/wordcount.txt /wordcount/input/
19/03/10 21:07:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3:[hadoop@localhost bin]$ ./hadoop fs -ls /wordcount/input
19/03/10 21:08:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 hadoop supergroup 20 2019-03-10 21:07 /wordcount/input/wordcount.txt
4:
[hadoop@localhost mapreduce]$ ~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop jar wordcount /wordcount/input/ /wordcount/output/
Not a valid JAR: /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce2/wordcount
[hadoop@localhost mapreduce]$ ~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar wordcount /wordcount/input/ /wordcount/output/
19/03/10 21:11:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/10 21:11:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/03/10 21:11:28 INFO input.FileInputFormat: Total input paths to process : 1
19/03/10 21:11:28 INFO mapreduce.JobSubmitter: number of splits:1
19/03/10 21:11:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552221496768_0002
19/03/10 21:11:29 INFO impl.YarnClientImpl: Submitted application application_1552221496768_0002
19/03/10 21:11:29 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1552221496768_0002/
19/03/10 21:11:29 INFO mapreduce.Job: Running job: job_1552221496768_0002
19/03/10 21:11:39 INFO mapreduce.Job: Job job_1552221496768_0002 running in uber mode : false
19/03/10 21:11:39 INFO mapreduce.Job: map 0% reduce 0%
19/03/10 21:11:44 INFO mapreduce.Job: map 100% reduce 0%
19/03/10 21:11:50 INFO mapreduce.Job: map 100% reduce 100%
19/03/10 21:11:51 INFO mapreduce.Job: Job job_1552221496768_0002 completed successfully
19/03/10 21:11:51 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=44
FILE: Number of bytes written=222971
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=136
HDFS: Number of bytes written=22
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3375
Total time spent by all reduces in occupied slots (ms)=3556
Total time spent by all map tasks (ms)=3375
Total time spent by all reduce tasks (ms)=3556
Total vcore-seconds taken by all map tasks=3375
Total vcore-seconds taken by all reduce tasks=3556
Total megabyte-seconds taken by all map tasks=3456000
Total megabyte-seconds taken by all reduce tasks=3641344
Map-Reduce Framework
Map input records=2
Map output records=6
Map output bytes=44
Map output materialized bytes=44
Input split bytes=116
Combine input records=6
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=44
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=100
CPU time spent (ms)=890
Physical memory (bytes) snapshot=321126400
Virtual memory (bytes) snapshot=5423050752
Total committed heap usage (bytes)=226627584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=20
File Output Format Counters
Bytes Written=22
5:[hadoop@localhost mapreduce]$ ~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop fs -ls /wordcount/output
19/03/10 21:13:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2019-03-10 21:11 /wordcount/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 22 2019-03-10 21:11 /wordcount/output/part-r-00000
6:
[hadoop@localhost mapreduce]$ ~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop fs -text /wordcount/output/part-r-00000
19/03/10 21:15:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
bb 1
dd 3
rr 1
zhdc 1
[hadoop@localhost mapreduce]$
Hive 实现 word count
1:创建表
create table d6_wc(sentence string);
hive> create table d6_wc(sentence string);
OK
Time taken: 0.417 seconds
hive>
> select * from d6_wc;
OK
Time taken: 0.43 seconds
2:倒入数据进hive 表
hive> load data local inpath '/home/hadoop/data/wordcount.txt' into table d6_wc;
Loading data to table default.d6_wc
Table default.d6_wc stats: [numFiles=1, totalSize=20]
OK
Time taken: 1.213 seconds
hive> select * from d6_wc;
OK
zhdc dd dd bb
rr dd
Time taken: 0.115 seconds, Fetched: 2 row(s)
hive>
1:进行单词统计
hive> select word,count(1) c
> from
> (select explode(split(sentence,'\t')) as word from d6_wc) t
> group by word
> order by c desc;