The requirements of the job from: https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/3339
Hadoop Comprehensive operational requirements:
1. Upload a csv file operations generated large reptiles to HDFS
Chosen here is the big reptile job - evaluation of the top 250 movies of watercress
Here is douban.csv selected file, a total of 32829 data.
First, create a / usr / local / bigdatacase in the local / dataset folder. Then copy douban250.csv files to this folder, and then
Delete the first line and displaying the first five records recorded as shown below:
CSV file for preprocessing text files generated Untitled
Pre_deal.sh edit files csv file data preprocessing, so that the content pre_deal.sh take effect. As shown below:
See user_table.txt contents inside, as shown below:
The user_table.txt stored in / usr / local / folder bigdatacase authority given below:
Then, start hadoop, establish / bigdatacase / dataset folder on HDFS
And upload the user_table.txt step HDFS follows:
See the HDFS User_table.txt the first 10 rows, as shown below:
Start the MySQL database, start Hadoop, Hive start, enter the command to create a database dblab Hive in line, as shown below:
Create an external table, the data is loaded / bigdatacase under HDFS in / dataset directory to the Hive warehouse,
And displaying first ten bigdata_user data shown below:
Query 10 before watercress user rating for the movie, as shown below:
Queries film score was 9 user evaluation of the film. As shown below:
See watercress movie film score is less than 8 minutes, as shown below:
View watercress movie character evaluation score of less than 8 minutes of the movie. As shown below:
Summary: This semester I have a more in-depth understanding of the Hadoop file system mapreduce there hdfs, also hive of creating a database,
Structured Query function more in-depth understanding. More learning python. To understand the true purpose of this course, semester and learned a lot of new knowledge, but also
Review previous knowledge of computer so I have a more in-depth understanding!