【大数据入门实践】hive wordcount

hadoop wordcount 例子

1:
[hadoop@localhost bin]$ ./hadoop fs -mkdir -p /wordcount/input
19/03/10 21:05:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2:
[hadoop@localhost bin]$ ./hadoop fs -put ~/data/wordcount.txt /wordcount/input/
19/03/10 21:07:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

3:[hadoop@localhost bin]$ ./hadoop fs -ls /wordcount/input
19/03/10 21:08:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 hadoop supergroup         20 2019-03-10 21:07 /wordcount/input/wordcount.txt

4:
[hadoop@localhost mapreduce]$ ~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop jar wordcount /wordcount/input/ /wordcount/output/
Not a valid JAR: /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce2/wordcount
[hadoop@localhost mapreduce]$ ~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar wordcount /wordcount/input/ /wordcount/output/
19/03/10 21:11:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/10 21:11:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/03/10 21:11:28 INFO input.FileInputFormat: Total input paths to process : 1
19/03/10 21:11:28 INFO mapreduce.JobSubmitter: number of splits:1
19/03/10 21:11:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552221496768_0002
19/03/10 21:11:29 INFO impl.YarnClientImpl: Submitted application application_1552221496768_0002
19/03/10 21:11:29 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1552221496768_0002/
19/03/10 21:11:29 INFO mapreduce.Job: Running job: job_1552221496768_0002
19/03/10 21:11:39 INFO mapreduce.Job: Job job_1552221496768_0002 running in uber mode : false
19/03/10 21:11:39 INFO mapreduce.Job:  map 0% reduce 0%
19/03/10 21:11:44 INFO mapreduce.Job:  map 100% reduce 0%
19/03/10 21:11:50 INFO mapreduce.Job:  map 100% reduce 100%
19/03/10 21:11:51 INFO mapreduce.Job: Job job_1552221496768_0002 completed successfully
19/03/10 21:11:51 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=44
		FILE: Number of bytes written=222971
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=136
		HDFS: Number of bytes written=22
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3375
		Total time spent by all reduces in occupied slots (ms)=3556
		Total time spent by all map tasks (ms)=3375
		Total time spent by all reduce tasks (ms)=3556
		Total vcore-seconds taken by all map tasks=3375
		Total vcore-seconds taken by all reduce tasks=3556
		Total megabyte-seconds taken by all map tasks=3456000
		Total megabyte-seconds taken by all reduce tasks=3641344
	Map-Reduce Framework
		Map input records=2
		Map output records=6
		Map output bytes=44
		Map output materialized bytes=44
		Input split bytes=116
		Combine input records=6
		Combine output records=4
		Reduce input groups=4
		Reduce shuffle bytes=44
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=100
		CPU time spent (ms)=890
		Physical memory (bytes) snapshot=321126400
		Virtual memory (bytes) snapshot=5423050752
		Total committed heap usage (bytes)=226627584
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=20
	File Output Format Counters 
		Bytes Written=22

5:[hadoop@localhost mapreduce]$ ~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop fs -ls /wordcount/output
19/03/10 21:13:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2019-03-10 21:11 /wordcount/output/_SUCCESS
-rw-r--r--   1 hadoop supergroup         22 2019-03-10 21:11 /wordcount/output/part-r-00000

6:
[hadoop@localhost mapreduce]$ ~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop fs -text /wordcount/output/part-r-00000
19/03/10 21:15:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
bb	1
dd	3
rr	1
zhdc	1
[hadoop@localhost mapreduce]$

Hive 实现 word count

1:创建表
create table d6_wc(sentence string);

hive> create table d6_wc(sentence string);
OK
Time taken: 0.417 seconds

hive> 
    > select * from d6_wc;
OK
Time taken: 0.43 seconds

2:倒入数据进hive 表
hive> load data local inpath '/home/hadoop/data/wordcount.txt' into table d6_wc;
Loading data to table default.d6_wc
Table default.d6_wc stats: [numFiles=1, totalSize=20]
OK
Time taken: 1.213 seconds

hive> select * from d6_wc;
OK
zhdc	dd	dd	bb
rr	dd
Time taken: 0.115 seconds, Fetched: 2 row(s)
hive>

1:进行单词统计
hive> select word,count(1) c
    > from
    > (select explode(split(sentence,'\t')) as word from d6_wc) t
    > group by word
    > order by c desc;

【大数据入门实践】hive wordcount

猜你喜欢