前提条件:
安装好hadoop2.7.3(LInux系统下)
安装好hive2.3.3(Linux系统下)
安装好Xampp(Windows系统下),并成功用Navicat连接Xampp Mysql。参考:Navicat连接Xampp数据库
准备源数据:
1. 打开终端,新建emp.csv文件
$ nano emp.csv
输入内容如下,保存退出。
7369,SMITH,CLERK,7902,1980/12/17,800,,20
7499,ALLEN,SALESMAN,7698,1981/2/20,1600,300,30
7521,WARD,SALESMAN,7698,1981/2/22,1250,500,30
7566,JONES,MANAGER,7839,1981/4/2,2975,,20
7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981/5/1,2850,,30
7782,CLARK,MANAGER,7839,1981/6/9,2450,,10
7788,SCOTT,ANALYST,7566,1987/4/19,3000,,20
7839,KING,PRESIDENT,,1981/11/17,5000,,10
7844,TURNER,SALESMAN,7698,1981/9/8,1500,0,30
7876,ADAMS,CLERK,7788,1987/5/23,1100,,20
7900,JAMES,CLERK,7698,1981/12/3,950,,30
7902,FORD,ANALYST,7566,1981/12/3,3000,,20
7934,MILLER,CLERK,7782,1982/1/23,1300,,10
2. 新建dept.csv文件
$ nano dept.csv
输入以下内容,保存退出
10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON
实验操作:
(1)把上面两张表上传到hdfs某个目录下,如/001/hive
在linux终端下输入命令:
hdfs dfs -mkdir -p /001/hive
hdfs dfs -put dept.csv /001/hive
hdfs dfs -put emp.csv /001/hive
(2)创建员工表(emp+学号,如:emp001)注意:在hive命令行下输入:
进入hive命令行:
$ hive
新建hive表,表名为emp001
create table emp001(empno int,ename string,job string,mgr int,hiredate string,sal int,comm int,deptno int) row format delimited fields terminated by ',';
(3)创建部门表(dept+学号,如:dept001)
create table dept001(deptno int,dname string,loc string) row format delimited fields terminated by ',';
(4)导入数据
load data inpath '/001/hive/emp.csv' into table emp001;
load data inpath '/001/hive/dept.csv' into table dept001;
(5)根据员工的部门号创建分区,表名emp_part+学号,如:emp_part001
create table emp_part001(empno int,ename string,job string,mgr int,hiredate string,sal int,comm int)partitioned by (deptno int)row format delimited fields terminated by ',';
往分区表中插入数据:指明导入的数据的分区(通过子查询导入数据)。
insert into table emp_part001 partition(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp001 where deptno=10;
insert into table emp_part001 partition(deptno=20) select empno,ename,job,mgr,hiredate,sal,comm from emp001 where deptno=20;
insert into table emp_part001 partition(deptno=30) select empno,ename,job,mgr,hiredate,sal,comm from emp001 where deptno=30;
(6)创建一个桶表,表名emp_bucket+学号,如:emp_bucket001,根据员工的职位(job)进行分桶
create table emp_bucket001(empno int,ename string,job string,mgr int,hiredate string,sal int,comm int,deptno int)clustered by (job) into 4 buckets row format delimited fields terminated by ',';
通过子查询插入数据:
insert into emp_bucket001 select * from emp001;
(7)查询所有的员工信息
select * from emp001;
(8)查询员工信息:员工号 姓名 薪水
select empno,ename,sal from emp001;
(9)多表查询
select dept001.dname,emp001.ename from emp001,dept001 where emp001.deptno=dept001.deptno;
(10)做报表,根据职位给员工涨工资,把涨前、涨后的薪水显示出来
按如下规则涨薪,PRESIDENT涨1000元,MANAGER涨800元,其他人员涨400元
select empno,ename,job,sal,
case job when 'PRESIDENT' then sal+1000
when 'MANAGER' then sal+800
else sal+400
end
from emp001;
完成!