Hive数据处理之报表累计

数据:

+----------+---------+--------+
| username | month   | salary |
+----------+---------+--------+
| A        | 2015-01 |      5 |
| A        | 2015-01 |     15 |
| B        | 2015-01 |      6 |
| A        | 2015-01 |      8 |
| B        | 2015-01 |     25 |
| A        | 2015-02 |     20 |
| B        | 2015-02 |     15 |
| B        | 2015-02 |     10 |
| A        | 2015-02 |      7 |
| A        | 2015-02 |      9 |
| B        | 2015-02 |      6 |
+----------+---------+--------+

上面的是报表中的数据;

SQL文件:

CREATE DATABASE /*!32312 IF NOT EXISTS*/`test` /*!40100 DEFAULT CHARACTER SET utf8 */;

USE `test`;

/*Table structure for table `t_access_times` */

DROP TABLE IF EXISTS `t_access_times`;

CREATE TABLE `t_access_times` (
  `username` varchar(10) DEFAULT NULL,
  `month` varchar(20) DEFAULT NULL,
  `salary` int(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/*Data for the table `t_access_times` */

insert  into `t_access_times`(`username`,`month`,`salary`) values ('A','2015-01',5),('A','2015-01',15),('B','2015-01',6),('A','2015-01',8),('B','2015-01',25),('A','2015-02',20),('B','2015-02',15),('B','2015-02',10),('A','2015-02',7),('A','2015-02',9),('B','2015-02',6);

我们最终要查出来的效果是这样的:

	+----------+---------+--------+---------------+
	| username | MONTH   | salary | accumulate    |
	+----------+---------+--------+---------------+
	| A        | 2015-01 |     28 |            28 |
	| A        | 2015-02 |     36 |            64 |
	| B        | 2015-01 |     31 |            31 |
	| B        | 2015-02 |     31 |            62 |
	+----------+---------+--------+---------------+

分析:

第一行数据是第一个月份A的工资,最后一列是累计的工资,后面的月份订单工资会将前面的所有的月份的工资全部加起来进行累计。


我们一步一步来看:

我们先将原始数据进行加工:

先将每个员工的每个月份的工资总和进行统计:

 SELECT username,MONTH,SUM(salary) AS salary
 FROM t_access_times 
 GROUP BY username,MONTH
+----------+---------+--------+
| username | MONTH   | salary |
+----------+---------+--------+
| A        | 2015-01 |     28 |
| A        | 2015-02 |     36 |
| B        | 2015-01 |     31 |
| B        | 2015-02 |     31 |
+----------+---------+--------+

上面就统计出了每个员工的每个月份的工资的总和。

我面将两张这样的表进行联合起来:

SELECT A.*,B.*
FROM
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
A
INNER JOIN 
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
B

两两张表合并:

+----------+---------+--------+----------+---------+--------+
| username | MONTH   | salary | username | MONTH   | salary |
+----------+---------+--------+----------+---------+--------+
| A        | 2015-01 |     28 | A        | 2015-01 |     28 |
| A        | 2015-02 |     36 | A        | 2015-01 |     28 |
| B        | 2015-01 |     31 | A        | 2015-01 |     28 |
| B        | 2015-02 |     31 | A        | 2015-01 |     28 |
| A        | 2015-01 |     28 | A        | 2015-02 |     36 |
| A        | 2015-02 |     36 | A        | 2015-02 |     36 |
| B        | 2015-01 |     31 | A        | 2015-02 |     36 |
| B        | 2015-02 |     31 | A        | 2015-02 |     36 |
| A        | 2015-01 |     28 | B        | 2015-01 |     31 |
| A        | 2015-02 |     36 | B        | 2015-01 |     31 |
| B        | 2015-01 |     31 | B        | 2015-01 |     31 |
| B        | 2015-02 |     31 | B        | 2015-01 |     31 |
| A        | 2015-01 |     28 | B        | 2015-02 |     31 |
| A        | 2015-02 |     36 | B        | 2015-02 |     31 |
| B        | 2015-01 |     31 | B        | 2015-02 |     31 |
| B        | 2015-02 |     31 | B        | 2015-02 |     31 |
+----------+---------+--------+----------+---------+--------+

进行的是笛卡尔积,所以应该去除不吻合的,B跟A是不能搭配的,所以应该加上连接条件。
ON A.username = B.username

SELECT A.*,B.*
FROM
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
A
INNER JOIN 
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
B
ON A.username = B.username

查询结果:

A   2015-01   28		A   2015-01   28
A   2015-01   28		A   2015-02   36
A   2015-02   36		A   2015-01   28
A   2015-02   36		A   2015-02   36
B   2015-01   31		B   2015-01   31
B   2015-01   31		B   2015-02   31
B   2015-02   31		B   2015-01   31
B   2015-02   31		B   2015-02   31

要对每个用户跟月份进行分组。 A用户的1月份的都是应该在一起的,B用户的一月份的也应该在一起:

SELECT A.*,B.*
FROM
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
A
INNER JOIN 
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
B
ON A.username = B.username

GROUP BY A.username,A.month
A   2015-01   28		A   2015-01   28
A   2015-02	 36		A   2015-01   28
B   2015-01   31		B   2015-01   31
B   2015-02   31		B   2015-01   31

我们显示的是 B.*
但是合并的时候,B这些数据会有两行,但是只会取一行,我们可以试着将B的两行数据的工资都加起来。
A的数据也是两行,但是这两行数据都是一样的。

SELECT A.*,SUM(B.salary)
FROM
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
A
INNER JOIN 
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
B
ON A.username = B.username

GROUP BY A.username,A.month

得到的结果:

	A   2015-01   28		(36+28) = 64
	A   2015-02   36		(36+28) = 64
	B   2015-01   31		(31+31) = 62
	B   2015-02   31		(31+31) = 62

看最后一列的数据是将A的全部月份的数据都加起来(这里只有两个月份),但是其实我们这个是不能这样的,

SELECT A.*,SUM(B.salary)
FROM
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
A
INNER JOIN 
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
B
ON A.username = B.username
WHERE A.month >= B.month
GROUP BY A.username,A.month

这样得到的数据:

A   2015-01   28		36
A   2015-02   36		(36+28) = 64
B   2015-01   31		31
B   2015-02   31		(31+31) = 62

最后进行排序

SELECT A.*,SUM(B.salary) as accumulate
FROM
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
A
INNER JOIN 
(SELECT username,MONTH,SUM(salary) AS salary FROM t_access_times GROUP BY username,MONTH)
B
ON A.username = B.username
WHERE A.month >= B.month
GROUP BY A.username,A.month
ORDER BY A.username,A.month;

最后的结果:

+----------+---------+--------+------------+
| username | MONTH   | salary | accumulate |
+----------+---------+--------+------------+
| A        | 2015-01 |     28 |         28 |
| A        | 2015-02 |     36 |         64 |
| B        | 2015-01 |     31 |         31 |
| B        | 2015-02 |     31 |         62 |
+----------+---------+--------+------------+

猜你喜欢

转载自blog.csdn.net/qq_38200548/article/details/84429853