MySQL——GROUP BY语句详解

1. GROUP BY语句

Group By语句可以根据一个或多个列对结果集进行分组，在分组的列上我们可以使用COUNT, SUM, AVG等函数。它的语法为select column_name, function(column_name) from table_name where column_name operator value group by column_name;。

这里，我们使用employee_tbl数据表来分析一些实例。首先，employee_tbl数据表的生成代码如下。

create table employee_tbl(
	id int not null,
    name char(10) not null default '',
    date datetime not null,
    singin tinyint(4) not null default '0',
    primary key(id)
)engine=InnoDB default charset=utf8;
insert into employee_tbl values
            ('1', '小明', '2016-04-22 15:25:33', '1'),
            ('2', '小王', '2016-04-20 15:25:47', '3'),
            ('3', '小丽', '2016-04-19 15:26:02', '2'),
            ('4', '小王', '2016-04-07 15:26:14', '4'),
            ('5', '小明', '2016-04-11 15:26:40', '4'),
            ('6', '小明', '2016-04-04 15:26:54', '2');

employee_tbl数据表的结果如下。

+----+------+---------------------+--------+
| id | name | date                | singin |
+----+------+---------------------+--------+
|  1 | 小明 | 2016-04-22 15:25:33 |      1 |
|  2 | 小王 | 2016-04-20 15:25:47 |      3 |
|  3 | 小丽 | 2016-04-19 15:26:02 |      2 |
|  4 | 小王 | 2016-04-07 15:26:14 |      4 |
|  5 | 小明 | 2016-04-11 15:26:40 |      4 |
|  6 | 小明 | 2016-04-04 15:26:54 |      2 |
+----+------+---------------------+--------+

可以使用Group By语句将employee_tbl按name进行分组，并统计每个人有多少条记录，代码和结果如下。

select name, COUNT(*) from employee_tbl group by name;

+------+----------+
| name | COUNT(*) |
+------+----------+
| 小明 |        3 |
| 小王 |        2 |
| 小丽 |        1 |
+------+----------+

使用with rollup可以实现在分组统计数据的基础上再进行总的统计，用NULL表示，代码和结果如下。

select name, SUM(singin) as singin_count from employee_tbl group by name with rollup;

+------+--------------+
| name | singin_count |
+------+--------------+
| 小丽 |            2 |
| 小明 |            7 |
| 小王 |            7 |
| NULL |           16 |
+------+--------------+

可以使用coalesce来设置一个可以取代NULL的名称，select coalesce(a, b, c)说明：如果a == null，则选择b；如果b == null，则选择c；如果a!=null，则选择a；如果a、b、c都为null，则返回null。代码和结果如下所示。

select coalesce(name, '总数'), SUM(singin) as singin_count from employee_tbl group by name with rollup;

+------------------------+--------------+
| coalesce(name, '总数') | singin_count |
+------------------------+--------------+
| 小丽                   |            2 |
| 小明                   |            7 |
| 小王                   |            7 |
| 总数                   |           16 |
+------------------------+--------------+

2. GROUP BY语句与聚合函数配合

依旧是employee_tbl数据表，我们先执行select * from employee_tbl group by name;，看看会有怎样的结果。

+----+------+---------------------+--------+
| id | name | date                | singin |
+----+------+---------------------+--------+
|  1 | 小明 | 2016-04-22 15:25:33 |      1 |
|  2 | 小王 | 2016-04-20 15:25:47 |      3 |
|  3 | 小丽 | 2016-04-19 15:26:02 |      2 |
+----+------+---------------------+--------+

和employee_tbl数据表对比，原先小明和小王分别对应有3条和2条记录，而通过group by语句最终只剩下了1条记录，那这是什么原因呢？实际上，group by语句在执行后，我们可以认为生成了如下的一个虚拟表（想象出来的）。

+----+------+---------------------+--------+
| id | name | date                | singin |
+----+------+---------------------+--------+
|  1 |     | 2016-04-22 15:25:33 |      1 |
|  5 | 小明 | 2016-04-11 15:26:40 |      4 |
|  6 |     | 2016-04-04 15:26:54 |      2 |
+----+------+---------------------+--------+
|  2 | 小王 | 2016-04-20 15:25:47 |      3 |
|  4 |     | 2016-04-07 15:26:14 |      4 |
+----+------+---------------------+--------+
|  3 | 小丽 | 2016-04-19 15:26:02 |      2 |
+----+------+---------------------+--------+

也就是说相同name的记录合并成了一行，如果执行select * 的话，它只会提取对应单元格中的第一个数据；而聚合函数就可以对多数据的单元格进行处理。所以，我们可以来看看下面这道题。

在这里插入图片描述

我们首先来创建department表，代码如下。

create database leetcode;
use leetcode;
create table department_1179 (
	id int,
    revenue int,
    month varchar(11) not null,
    primary key(id, month)
)engine=InnoDB default charset=utf8;
insert into department_1179 values
	(1, 8000, 'Jan'),
    (2, 9000, 'Jan'),
    (3, 10000, 'Feb'),
    (1, 7000, 'Feb'),
    (1, 6000, 'Mar');

为了重新格式化department表，获得查询得到的结果表的形式，需要把行转为列，我们先尝试用下面的代码看看是什么效果？

use leetcode;
select id,
(case when month='Jan' then revenue end) as Jan_Revenue,
(case when month='Feb' then revenue end) as Feb_Revenue,
(case when month='Mar' then revenue end) as Mar_Revenue,
(case when month='Apr' then revenue end) as Apr_Revenue,
(case when month='May' then revenue end) as May_Revenue,
(case when month='Jun' then revenue end) as Jun_Revenue,
(case when month='Jul' then revenue end) as Jul_Revenue,
(case when month='Aug' then revenue end) as Aug_Revenue,
(case when month='Sep' then revenue end) as Sep_Revenue,
(case when month='Oct' then revenue end) as Oct_Revenue,
(case when month='Nov' then revenue end) as Nov_Revenue,
(case when month='Dec' then revenue end) as Dec_Revenue 
from department_1179 group by id order by id;

在这里插入图片描述

这样就出现了错误，当id=1时，Jan_Revenue和Mar_Revenue都变成了NULL，这是由于case when只会提取多数据单元格中的第一个数据（id=1时，month对应的多数据单元格中包含Feb、Jan和Mar），如果第一个数据不符合条件，那么不会读取剩下的数据。所以这里我们应该使用聚合函数，如sum(case when month='Jan' then revenue end)，当id=1时，它会在Feb、Jan和Mar中寻找符合条件的Jan，并返回其对应的revenue的值。代码和结果如下。

use leetcode;
select id,
sum(case when month='Jan' then revenue end) as Jan_Revenue,
sum(case when month='Feb' then revenue end) as Feb_Revenue,
sum(case when month='Mar' then revenue end) as Mar_Revenue,
sum(case when month='Apr' then revenue end) as Apr_Revenue,
sum(case when month='May' then revenue end) as May_Revenue,
sum(case when month='Jun' then revenue end) as Jun_Revenue,
sum(case when month='Jul' then revenue end) as Jul_Revenue,
sum(case when month='Aug' then revenue end) as Aug_Revenue,
sum(case when month='Sep' then revenue end) as Sep_Revenue,
sum(case when month='Oct' then revenue end) as Oct_Revenue,
sum(case when month='Nov' then revenue end) as Nov_Revenue,
sum(case when month='Dec' then revenue end) as Dec_Revenue 
from department_1179 group by id order by id;

在这里插入图片描述