Hive的查询操作(group by , join, 多表连接)

版权声明:Please make the source marked https://blog.csdn.net/qq_31807385/article/details/84721368

目录

数据准备:

分组实操练习:

Join连接实操练习:

多表的连接实操练习:


数据准备:

1,创建emp表和dept表,并向其中导入如下的数据:

create database db_select;
use db_select;

create table if not exists dept(
deptno int,
dname string,
loc int
)
row format delimited
fields terminated by '\t';

create table if not exists emp(
empno int,
ename string,
job string,
mgr int,
hiredata string,
sal double,
comm double,
deptno int
)
row format delimited
fields delimited by '\t';

0: jdbc:hive2://hadoop108:10000> show tables;
+-----------+--+
| tab_name  |
+-----------+--+
| dept      |
| emp       |
| location  |
+-----------+--+

[isea@hadoop108 datas]$ vim dept.txt
10	ACCOUNTING	1700
20	RESEARCH	1800
30	SALES	1900
40	OPERATIONS	1700

[isea@hadoop108 datas]$ vim emp.txt
7369	SMITH	CLERK	7902	1980-12-17	800.00		20
7499	ALLEN	SALESMAN	7698	1981-2-20	1600.00	300.00	30
7521	WARD	SALESMAN	7698	1981-2-22	1250.00	500.00	30
7566	JONES	MANAGER	7839	1981-4-2	2975.00		20
7654	MARTIN	SALESMAN	7698	1981-9-28	1250.00	1400.00	30
7698	BLAKE	MANAGER	7839	1981-5-1	2850.00		30
7782	CLARK	MANAGER	7839	1981-6-9	2450.00		10
7788	SCOTT	ANALYST	7566	1987-4-19	3000.00		20
7839	KING	PRESIDENT		1981-11-17	5000.00		10
7844	TURNER	SALESMAN	7698	1981-9-8	1500.00	0.00	30
7876	ADAMS	CLERK	7788	1987-5-23	1100.00		20
7900	JAMES	CLERK	7698	1981-12-3	950.00		30
7902	FORD	ANALYST	7566	1981-12-3	3000.00		20
7934	MILLER	CLERK	7782	1982-1-23	1300.00		10

[isea@hadoop108 datas]$ vim location.txt
1700    Beijing
1800    London
1900    Tokyo



load data local inpath '/opt/module/datas/dept.txt' into table dept;

load data local inpath '/opt/module/datas/emp.txt' into table emp;

load data local inpath '/opt/module/datas/location.txt' into table location;

分组实操练习:

1,计算每个部门的平均工资:
select deptno,avg(sal) from emp group by deptno;
+---------+---------------------+--+
| deptno  |         _c1         |
+---------+---------------------+--+
| 10      | 2916.6666666666665  |
| 20      | 2175.0              |
| 30      | 1566.6666666666667  |
+---------+---------------------+--+


2,计算emp每个部门的最高薪水:
select deptno,max(sal) from emp group by deptno;
+---------+---------+--+
| deptno  |   _c1   |
+---------+---------+--+
| 10      | 5000.0  |
| 20      | 3000.0  |
| 30      | 2850.0  |
+---------+---------+--+

3,求每个部门的平均薪水大于2000的部门:
select deptno,avg(sal) AVG from emp group by deptno having AVG > 2000; 
+---------+---------------------+--+
| deptno  |         avg         |
+---------+---------------------+--+
| 10      | 2916.6666666666665  |
| 20      | 2175.0              |
+---------+---------------------+--+

4,根据员工表和部门表中的部门编号相等,查询员工编号、员工名称和部门名称;
select empno,ename,dname from emp left join dept on  emp.deptno = dept.deptno;
+--------+---------+-------------+--+
| empno  |  ename  |    dname    |
+--------+---------+-------------+--+
| 7369   | SMITH   | RESEARCH    |
| 7499   | ALLEN   | SALES       |
| 7521   | WARD    | SALES       |
| 7566   | JONES   | RESEARCH    |
| 7654   | MARTIN  | SALES       |
| 7698   | BLAKE   | SALES       |
| 7782   | CLARK   | ACCOUNTING  |
| 7788   | SCOTT   | RESEARCH    |
| 7839   | KING    | ACCOUNTING  |
| 7844   | TURNER  | SALES       |
| 7876   | ADAMS   | RESEARCH    |
| 7900   | JAMES   | SALES       |
| 7902   | FORD    | RESEARCH    |
| 7934   | MILLER  | ACCOUNTING  |
+--------+---------+-------------+--+

Join连接实操练习:

合并员工表和部门表:这里使用到了内连接
select e.ename,e.empno,e.sal,e.deptno,d.dname from emp e join dept d on e.deptno = d.deptno;
+----------+----------+---------+-----------+-------------+--+
| e.ename  | e.empno  |  e.sal  | e.deptno  |   d.dname   |
+----------+----------+---------+-----------+-------------+--+
| SMITH    | 7369     | 800.0   | 20        | RESEARCH    |
| ALLEN    | 7499     | 1600.0  | 30        | SALES       |
| WARD     | 7521     | 1250.0  | 30        | SALES       |
| JONES    | 7566     | 2975.0  | 20        | RESEARCH    |
| MARTIN   | 7654     | 1250.0  | 30        | SALES       |
| BLAKE    | 7698     | 2850.0  | 30        | SALES       |
| CLARK    | 7782     | 2450.0  | 10        | ACCOUNTING  |
| SCOTT    | 7788     | 3000.0  | 20        | RESEARCH    |
| KING     | 7839     | 5000.0  | 10        | ACCOUNTING  |
| TURNER   | 7844     | 1500.0  | 30        | SALES       |
| ADAMS    | 7876     | 1100.0  | 20        | RESEARCH    |
| JAMES    | 7900     | 950.0   | 30        | SALES       |
| FORD     | 7902     | 3000.0  | 20        | RESEARCH    |
| MILLER   | 7934     | 1300.0  | 10        | ACCOUNTING  |
+----------+----------+---------+-----------+-------------+--+
这里如果没一个员工刚来,暂时没有分配部门,将不会显示在查询结果中


左外连接:左表的所有记录都会被保留下来,没有匹配到的,以null显示:
select e.empno,e.ename,e.sal,d.dname from emp e left join dept d where e.deptno = d.deptno;
+----------+----------+---------+-------------+--+
| e.empno  | e.ename  |  e.sal  |   d.dname   |
+----------+----------+---------+-------------+--+
| 7369     | SMITH    | 800.0   | RESEARCH    |
| 7499     | ALLEN    | 1600.0  | SALES       |
| 7521     | WARD     | 1250.0  | SALES       |
| 7566     | JONES    | 2975.0  | RESEARCH    |
| 7654     | MARTIN   | 1250.0  | SALES       |
| 7698     | BLAKE    | 2850.0  | SALES       |
| 7782     | CLARK    | 2450.0  | ACCOUNTING  |
| 7788     | SCOTT    | 3000.0  | RESEARCH    |
| 7839     | KING     | 5000.0  | ACCOUNTING  |
| 7844     | TURNER   | 1500.0  | SALES       |
| 7876     | ADAMS    | 1100.0  | RESEARCH    |
| 7900     | JAMES    | 950.0   | SALES       |
| 7902     | FORD     | 3000.0  | RESEARCH    |
| 7934     | MILLER   | 1300.0  | ACCOUNTING  |
+----------+----------+---------+-------------+--+
这里如果没一个员工刚来,暂时没有分配部门,也会显示在查询结果中

右外连接:右表的所有记录都会被保留下来,没有匹配到的,以null显示:
select e.empno,e.ename,e.sal,d.dname from emp e right join dept d where e.deptno = d.deptno;
+----------+----------+---------+-------------+--+
| e.empno  | e.ename  |  e.sal  |   d.dname   |
+----------+----------+---------+-------------+--+
| 7782     | CLARK    | 2450.0  | ACCOUNTING  |
| 7839     | KING     | 5000.0  | ACCOUNTING  |
| 7934     | MILLER   | 1300.0  | ACCOUNTING  |
| 7369     | SMITH    | 800.0   | RESEARCH    |
| 7566     | JONES    | 2975.0  | RESEARCH    |
| 7788     | SCOTT    | 3000.0  | RESEARCH    |
| 7876     | ADAMS    | 1100.0  | RESEARCH    |
| 7902     | FORD     | 3000.0  | RESEARCH    |
| 7499     | ALLEN    | 1600.0  | SALES       |
| 7521     | WARD     | 1250.0  | SALES       |
| 7654     | MARTIN   | 1250.0  | SALES       |
| 7698     | BLAKE    | 2850.0  | SALES       |
| 7844     | TURNER   | 1500.0  | SALES       |
| 7900     | JAMES    | 950.0   | SALES       |
+----------+----------+---------+-------------+--+
这里如果有一个刚陈成立的部门,暂时没有员工被调到该部门,那么这个部门仍然会显示在这个部门中


满外连接:左右表中的,没有匹配到的,都会以null显示
select e.empno,e.ename,e.sal,d.dname from emp e full join dept d where e.deptno = d.deptno;
+----------+----------+---------+-------------+--+
| e.empno  | e.ename  |  e.sal  |   d.dname   |
+----------+----------+---------+-------------+--+
| 7934     | MILLER   | 1300.0  | ACCOUNTING  |
| 7839     | KING     | 5000.0  | ACCOUNTING  |
| 7782     | CLARK    | 2450.0  | ACCOUNTING  |
| 7876     | ADAMS    | 1100.0  | RESEARCH    |
| 7788     | SCOTT    | 3000.0  | RESEARCH    |
| 7369     | SMITH    | 800.0   | RESEARCH    |
| 7566     | JONES    | 2975.0  | RESEARCH    |
| 7902     | FORD     | 3000.0  | RESEARCH    |
| 7844     | TURNER   | 1500.0  | SALES       |
| 7499     | ALLEN    | 1600.0  | SALES       |
| 7698     | BLAKE    | 2850.0  | SALES       |
| 7654     | MARTIN   | 1250.0  | SALES       |
| 7521     | WARD     | 1250.0  | SALES       |
| 7900     | JAMES    | 950.0   | SALES       |
+----------+----------+---------+-------------+--+
没有部门的员工和没有员工的部门都会显示在这张表中

多表的连接实操练习:

1,多表的连接查询:
select e.ename,d.dname,l.loc_name 
from emp e left join dept d on e.deptno = d.deptno 
left join location l on
d.loc = l.loc;

+----------+-------------+-------------+--+
| e.ename  |   d.dname   | l.loc_name  |
+----------+-------------+-------------+--+
| SMITH    | RESEARCH    | London      |
| ALLEN    | SALES       | Tokyo       |
| WARD     | SALES       | Tokyo       |
| JONES    | RESEARCH    | London      |
| MARTIN   | SALES       | Tokyo       |
| BLAKE    | SALES       | Tokyo       |
| CLARK    | ACCOUNTING  | Beijing     |
| SCOTT    | RESEARCH    | London      |
| KING     | ACCOUNTING  | Beijing     |
| TURNER   | SALES       | Tokyo       |
| ADAMS    | RESEARCH    | London      |
| JAMES    | SALES       | Tokyo       |
| FORD     | RESEARCH    | London      |
| MILLER   | ACCOUNTING  | Beijing     |
+----------+-------------+-------------+--+


大多数情况下,Hive会对每对JOIN连接对象启动一个MapReduce任务。本例中会首先启动一个MapReduce job对
表e和表d进行连接操作,然后会再启动一个MapReduce job将第一个MapReduce job的输出和表l;进行连接操作

猜你喜欢

转载自blog.csdn.net/qq_31807385/article/details/84721368