MySQL 通用表表达式(CTE)的用法 子查询的高效使用

原文地址​​​​​​​

MySQL的CTE有两种,一种是非递归的方式,另一种是递归的方式。

我们为什么需要使用CTE?

在同一个查询中不可能两次引用派生表。因此,派生表查询会计算两次或两次以上,这表明存在严重的性能问题。使用CTE,子查询只计算一次。

CTE 非递归方式

我们通常对派生表的使用是这样子的:

SELECT... FROM (subquery) AS derived, t1 ...

subquery这个子查询是放在FROM子句中。

CTE的语法如下所示:

SELECT... WITH derived AS (subquery) SELECT ... FROM derived, t1 ...  

这个subquery子查询是放在WITH AS子句中的,放在SELECT/UPDATE/DELETE,包括WITH derived AS 之句之前。

假如你要找出每年的薪资同比上一年上涨的百分比,如果不使用CTE,你需要两个子查询,并且这两个子查询相同,MySQL并不能识别出这两个是相同的查询,从而导致查询两次:

mysql> SELECT
q1.year,
q2.year AS next_year,
q1.sum,
q2.sum AS next_sum,
100*(q2.sum-q1.sum)/q1.sum AS pct
FROM
(SELECT year(from_date) as year, sum(salary) as
sum FROM salaries GROUP BY year) AS q1,
(SELECT year(from_date) as year, sum(salary) as sum
FROM salaries GROUP BY year) AS q2
WHERE q1.year = q2.year-1;
+------+-----------+-------------+-------------+-----
-----+
| year | next_year | sum | next_sum | pct
|+
------+-----------+-------------+-------------+-----
-----+
| 1985 | 1986 | 972864875 | 2052895941 |
111.0155 |
| 1986 | 1987 | 2052895941 | 3156881054 |
53.7770 |
| 1987 | 1988 | 3156881054 | 4295598688 |
36.0710 |
| 1988 | 1989 | 4295598688 | 5454260439 |
26.9732 |
| 1989 | 1990 | 5454260439 | 6626146391 |
21.4857 |
| 1990 | 1991 | 6626146391 | 7798804412 |
17.6974 |
| 1991 | 1992 | 7798804412 | 9027872610 |
15.7597 |
| 1992 | 1993 | 9027872610 | 10215059054 |
13.1502 |
| 1993 | 1994 | 10215059054 | 11429450113 |
11.8882 |
| 1994 | 1995 | 11429450113 | 12638817464 |

如果使用非递归CTE的方式就可以重用上次查询结果,那么就只需要查询一次即可:

mysql>
WITH CTE AS
(SELECT year(from_date) AS year, SUM(salary) AS
sum FROM salaries GROUP BY year)
SELECT
q1.year, q2.year as next_year, q1.sum, q2.sum as
next_sum, 100*(q2.sum-q1.sum)/q1.sum as pct FROM
CTE AS q1,
CTE AS q2
WHERE
q1.year = q2.year-1;
+------+-----------+-------------+-------------+-----
-----+
| year | next_year | sum | next_sum | pct
|+
------+-----------+-------------+-------------+-----
-----+
| 1985 | 1986 | 972864875 | 2052895941 |
111.0155 |
| 1986 | 1987 | 2052895941 | 3156881054 |
53.7770 |
| 1987 | 1988 | 3156881054 | 4295598688 |
36.0710 |
| 1988 | 1989 | 4295598688 | 5454260439 |
26.9732 |
| 1989 | 1990 | 5454260439 | 6626146391 |
21.4857 |
| 1990 | 1991 | 6626146391 | 7798804412 |
17.6974 |
| 1991 | 1992 | 7798804412 | 9027872610 |
15.7597 |
| 1992 | 1993 | 9027872610 | 10215059054 |
13.1502 |
| 1993 | 1994 | 10215059054 | 11429450113 |
11.8882 |
| 1994 | 1995 | 11429450113 | 12638817464 |
10.5812 |
| 1995 | 1996 | 12638817464 | 13888587737 |
9.8883 |
| 1996 | 1997 | 13888587737 | 15056011781 |
8.4056 |
| 1997 | 1998 | 15056011781 | 16220495471 |
7.7343 |
| 1998 | 1999 | 16220495471 | 17360258862 |
7.0267 |
| 1999 | 2000 | 17360258862 | 17535667603 |
1.0104 |
| 2000 | 2001 | 17535667603 | 17507737308 |
-0.1593 |
| 2001 | 2002 | 17507737308 | 10243358658 |
-41.4924 |
+------+-----------+-------------+-------------+-----
-----+
17 rows in set (1.63 sec)

查询结果一样,性能提升近50%;

另外,派生查询是不可以相互引用的:

SELECT ... FROM (SELECT ... FROM ...) AS d1, (SELECT ... FROM d1 ...) AS d2 ...
ERROR: 1146 (42S02): Table ‘db.d1’ doesn’t exist

上面先中一个查询标记为d1,然后在后面的查询中再次查询d1,这是不允许的。

而CTE的方式是可以相互引用的:

WITH d1 AS (SELECT ... FROM ...), d2 AS (SELECT ... FROM d1 ... )
SELECT FROM d1, d2 ...

d1和d2分别是两个子查询,但d2是查询d1的结果集的。

总结一下,非递归的CTE中,先使用WITH AS定义子查询,多个子查询之间用逗号分隔,然后再使用SELETE语句,并通过名称引用之前定义子查询。

CTE 递归方式

递归的方式是CTE的子查询可以引用其本身,使用递归方式时,WITH子句中要使用WITH RECURSIVE代替。递归CTE子句中必须包含两个部分,一个是种子查询(不可引用自身),另一个是递归查询,这两个子查询可以通过 UNION、UNION ALL或UNION DISTINCT 连接在一起。

种子SELECT只会执行一次,并得到初始的数据子集,而递归SELECT是会重复执行直到没有新的行产生为止,最终将所有的结果集都查询出来,这对于深层查询(如具有父子关系的查询)是非常有用的。

举个简单的例子,假如你要打印从1到5这5个数,使用递归CTE如下所示:

mysql> WITH RECURSIVE cte (n) AS
( SELECT 1 /* seed query */
UNION ALL
SELECT n + 1 FROM cte WHERE n < 5 /* recursive query */
)SELECT * FROM cte;
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+---+
5 rows in set (0.00 sec)

我们先来看下WITH RECURSIVE子句:

cte是子查询的名称,(n)是列,子查询语句为(SELECT 1 UNION ALL SELECT n+1 FROM cte WHERE n < 5),其中SELECT 1是种子SELECT,只执行一次,而SELECT n+1 FROM cte WHERE n<5是递归SELECT,也就是说这个递归查询会一直执行,直到n的值不小于5为止,注意在递归SELECT中引用于自身cte。子查询定义好后,再用一个SELECT来查询这个cte即可。

假如你要查询公司的组织架构数据,查询管理层级。

创建一个测试表:

mysql> CREATE TABLE employees_mgr (
id INT PRIMARY KEY NOT NULL,
name VARCHAR(100) NOT NULL,
manager_id INT NULL,
INDEX (manager_id),
FOREIGN KEY (manager_id) REFERENCES employees_mgr
(id)
);

插入样例数据:

mysql> INSERT INTO employees_mgr VALUES
(333, "Yasmina", NULL), # Yasmina is the CEO
(manager_id is NULL)
(198, "John", 333), # John has ID 198 and reports to
333 (Yasmina)
(692, "Tarek", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(123, "Adil", 692);

执行递归CTE:

mysql> WITH RECURSIVE employee_paths (id, name, path)
AS
(
SELECT id, name, CAST(id AS CHAR(200))
FROM employees_mgr
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, CONCAT(ep.path, ',', e.id)
FROM employee_paths AS ep JOIN employees_mgr AS e
ON ep.id = e.manager_id
)SELECT * FROM employee_paths ORDER BY path;

 结果如下所示:

+------+---------+-----------------+
| id | name | path |
+------+---------+-----------------+
| 333 | Yasmina | 333 |
| 198 | John | 333,198 |
| 29 | Pedro | 333,198,29 |
| 4610 | Sarah | 333,198,29,4610 |
| 72 | Pierre | 333,198,29,72 |
| 692 | Tarek | 333,692 |
| 123 | Adil | 333,692,123 |
+------+---------+-----------------+
7 rows in set (0.00 sec)

在path这一列就能看到管理层级的关系,333是最高的领导者,4610、72和123是小兵。

总结:

通常在查询树形结构时使用WITH RECURSIVE CTE查询,先定义子查询和数据列,再通过SELECT查询这个CTE子句即可。

如果只是简单的使用多个相同子查询就用非递归CTE,效率高哦~?

IT资源下载

猜你喜欢

转载自blog.csdn.net/zxstrive/article/details/87811561