【postgresql 基础入门】多表联合查询 join与union 并，交，差等集合操作，两者的区别之处

多表数据联合查询

专栏内容：

postgresql内核源码分析

手写数据库toadb

并发编程

开源贡献：

toadb开源库

个人主页：我的主页
管理社区：开源数据库
座右铭：天行健，君子以自强不息；地势坤，君子以厚德载物.

系列文章

文章目录

多表数据联合查询
系列文章
前言
概述
原理介绍
多表 join 连接操作
多表union 操作
- 联合类型说明
- 1. union
2. union all
3. except
3. intersect
总结
结尾

前言

postgresql 数据库是一款通用的关系型数据，在开源数据库中能与商业数据媲美，在业界也越来越流行。

因为是开源数据库，不仅公开源码，还有很多使用案例，好用的插件，所以它的慢慢变成了数据库的先驱和标准，通过postgresql可以很好从使用到原理，彻底搞懂；

如果是学习编程，也可以学到丰富的编程知识，数据结构，编程技巧，它里面还有很多精妙的架构设计，分层思想，可以灵活定制的思想。

本专栏主要介绍postgresql 入门使用，数据库维护管理，通过这些使用来了解数据库原理，慢慢了解postgresql是什么样的数据库，能做那些事情，以及如何做好服务，最关键的是这些知识都是面试的必备项。

概述

我们在实际应用中查询数据，往往涉及到多表的数据，如何使用一条SQL就能得到结果呢？

本文就来分享一下，多表数据的查询方法，并举例说明它们使用的技巧；

原理介绍

多表数据的联合查询，在postgresql 中有两个基本方法：

join 连接操作；
union 联合子查询；

多表join连接，其实就是通过某个列作为纽带，将多个实际的表连接成一张大表，然后在大表上进行查询；

而union 与 join 完全不同， union 通过联合多个子查询结果，也就是说union 操作的是查询结果，将多个结果集合并成一个结果集，然后在这个总结果集上再进行二次查询处理；
也就是我们数学中的集合的几种

而更加总结的话，就涉及到关系代数中对于集合的操作：
集合操作主要包括以下几种：

并集操作（Union）：将两个集合合并成一个集合，包括所有属于两个集合的元素。
交集操作（Intersection）：将两个集合的公共元素组成一个新的集合。
差集操作（Difference）：从一个集合中去掉属于另一个集合的元素，剩下的元素组成一个新的集合。
对称差集操作（Symmetric Difference）：将属于一个集合但不属于另一个集合的元素，以及属于另一个集合但不属于一个集合的元素组成一个新的集合。
笛卡尔积操作（Cartesian Product）：将两个集合的所有可能有序对组成一个新的集合。

多表 join 连接操作

在PostgreSQL中，多表查询是通过使用连接（JOIN）和交叉连接（CROSS JOIN）等操作来实现的。

连接操作是指将两个或多个表按照指定的条件进行关联，以获得它们之间的关系数据。
下面我们举例来说明，首先创建两张表custom 和 order;

以下是一个使用PostgreSQL进行JOIN操作的案例：

假设我们有两个表：customers和orders。customers表包含客户的信息，而orders表包含订单的信息。这两个表通过一个共同的字段customer_id相关联。

首先，让我们创建这两个表并插入一些数据：

CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(50),
  email VARCHAR(50)
);

CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  order_date DATE,
  total_amount DECIMAL(10,2)
);

INSERT INTO customers (customer_id, name, email)
VALUES (1, 'John Doe', '[email protected]'),
       (2, 'Jane Smith', '[email protected]'),
       (3, 'Bob Johnson', '[email protected]'),
       (4, 'Steven John', '[email protected]'),
       (5, 'Kenidy', '[email protected]');

INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES (1, 1, '2023-01-01', 100.00),
       (2, 1, '2023-02-01', 200.00),
       (3, 2, '2023-02-15', 150.00),
       (4, 3, '2023-03-01', 75.00);

1. 内连接（INNER JOIN）

将两个表中的行进行匹配，返回满足连接条件的行。语法如下：

postgres=# SELECT * FROM customers INNER JOIN orders ON customers.customer_id = orders.customer_id;
 customer_id |    name     |      email       | order_id | customer_id | order_date | total_amount
-------------+-------------+------------------+----------+-------------+------------+--------------
           1 | John Doe    | john@example.com |        1 |           1 | 2023-01-01 |       100.00
           1 | John Doe    | john@example.com |        2 |           1 | 2023-02-01 |       200.00
           2 | Jane Smith  | jane@example.com |        3 |           2 | 2023-02-15 |       150.00
           3 | Bob Johnson | bob@example.com  |        4 |           3 | 2023-03-01 |        75.00
(4 rows)

这里特意用 * 查出结果集中的所有列，让我们可以清晰看到结果集的全貌，可以看到join后的结果集，是两表的所有列的合并；

对于内联连，只是列出了符合连接条件的行，大家想一想，还有什么写法可以达到这种效果。

对于，条件写到where子句中也可以，所以内联接与where条件是等价的；

postgres=# select * from customers,orders where customers.customer_id = orders.customer_id;
 customer_id |    name     |      email       | order_id | customer_id | order_date | total_amount
-------------+-------------+------------------+----------+-------------+------------+--------------
           1 | John Doe    | [email protected] |        1 |           1 | 2023-01-01 |       100.00
           1 | John Doe    | [email protected] |        2 |           1 | 2023-02-01 |       200.00
           2 | Jane Smith  | [email protected] |        3 |           2 | 2023-02-15 |       150.00
           3 | Bob Johnson | [email protected]  |        4 |           3 | 2023-03-01 |        75.00
(4 rows)

2. 左连接（LEFT JOIN）

在内连接的基础上，将左侧表中的所有行都包含在结果集中，即使右侧表中没有匹配的行。语法如下：

postgres=# select * from customers left join orders on  customers.customer_id = orders.customer_id;
 customer_id |    name     |       email        | order_id | customer_id | order_date | total_amount
-------------+-------------+--------------------+----------+-------------+------------+--------------
           1 | John Doe    | john@example.com   |        1 |           1 | 2023-01-01 |       100.00
           1 | John Doe    | john@example.com   |        2 |           1 | 2023-02-01 |       200.00
           2 | Jane Smith  | jane@example.com   |        3 |           2 | 2023-02-15 |       150.00
           3 | Bob Johnson | bob@example.com    |        4 |           3 | 2023-03-01 |        75.00
           5 | Kenidy      | Kenidy@example.com |          |             |            |
           4 | Steven John | steven@example.com |          |             |            |
(6 rows)

左联接后的结果集，列也是两表的合并，而行数与之前不同，左边表列表出所有行，而右边的表只列出了符合条件的行，对于左表多出的行，右表以空代替；

3. 右连接（RIGHT JOIN）

在内连接的基础上，将右侧表中的所有行都包含在结果集中，即使左侧表中没有匹配的行。语法如下：

postgres=# select * from customers right join orders on  customers.customer_id = orders.customer_id;
 customer_id |    name     |      email       | order_id | customer_id | order_date | total_amount
-------------+-------------+------------------+----------+-------------+------------+--------------
           1 | John Doe    | john@example.com |        1 |           1 | 2023-01-01 |       100.00
           1 | John Doe    | john@example.com |        2 |           1 | 2023-02-01 |       200.00
           2 | Jane Smith  | jane@example.com |        3 |           2 | 2023-02-15 |       150.00
           3 | Bob Johnson | bob@example.com  |        4 |           3 | 2023-03-01 |        75.00
(4 rows)

右联接与左联接类似，结果集的行包括右表的所有行，左表只有符合联接表条件行；

4. 全连接（FULL JOIN）

相当于在左连接和右连接的基础上，同时做左连接和右连接，并返回两侧表中所有满足条件的行。语法如下：

postgres=# select * from customers full join orders on  customers.customer_id = orders.customer_id;
 customer_id |    name     |       email        | order_id | customer_id | order_date | total_amount
-------------+-------------+--------------------+----------+-------------+------------+--------------
           1 | John Doe    | john@example.com   |        1 |           1 | 2023-01-01 |       100.00
           1 | John Doe    | john@example.com   |        2 |           1 | 2023-02-01 |       200.00
           2 | Jane Smith  | jane@example.com   |        3 |           2 | 2023-02-15 |       150.00
           3 | Bob Johnson | bob@example.com    |        4 |           3 | 2023-03-01 |        75.00
           5 | Kenidy      | Kenidy@example.com |          |             |            |
           4 | Steven John | steven@example.com |          |             |            |
(6 rows)

全联接就是包括左右两条的所有行，没有符合条件的行以空代替；

多表union 操作

PostgreSQL中的联合查询是一种将多个SELECT语句的结果组合成一个结果集的方法。它允许您从多个表或查询中获取数据，并根据指定的条件将它们组合在一起。

联合查询的基本语法如下：

SELECT column1, column2, ...
FROM table1
UNION/UNION ALL/EXCEPT/INTERSECT
SELECT column1, column2, ...
FROM table2
WHERE condition;

这里有几个关键部分：

SELECT语句：用于指定要检索的列和表。
UNION、UNION ALL、EXCEPT和INTERSECT：这些关键字用于指定要执行的联合操作类型。
WHERE子句：可选的条件，用于筛选结果。

联合类型说明

UNION：返回两个查询结果的并集，但会删除重复的行。
UNION ALL：返回两个查询结果的并集，包括重复的行。
EXCEPT：返回第一个查询结果中存在但在第二个查询结果中不存在的行。
INTERSECT：返回两个查询结果中共有的行。

请注意，使用联合查询时，确保每个查询中选择的列数和列类型是一致的，否则可能会导致错误。

1. union

合并两个表的数据并删除重复行

postgres=# select  customer_id from customers union select customer_id from orders ;
 customer_id
-------------
           2
           3
           5
           4
           1
(5 rows)

这将返回一个结果集，其中包含两个表中所有不重复的行；

两个select 子句中的列数和类型必须一致才行，这样两个结果集才能合并到一起。

2. union all

合并两个表的数据并保留重复行

如果我们希望保留两个表中的所有行，包括重复的行，那么可以使用 UNION ALL 运算符。

postgres=# select  customer_id from customers union all select customer_id from orders ;
 customer_id
-------------
           1
           2
           3
           4
           5
           1
           1
           2
           3
(9 rows)

这将返回一个结果集，其中包含两个表中所有的行，包括重复的行。
同时，如果想要对结果进行排序；

可以使用 ORDER BY 子句。例如：

postgres=# select  customer_id from customers union all select customer_id from orders order by customer_id asc;
 customer_id
-------------
           1
           1
           1
           2
           2
           3
           3
           4
           5
(9 rows)

当然也可以加where 等其它子句；

3. except

获得两个集合的差，也就是前者集合中包括，而不属于后者集合的行；

postgres=# select  customer_id from customers except select customer_id from orders ;
 customer_id
-------------
           5
           4
(2 rows)

也就是查询还没有产生订单的客户ID列表；

3. intersect

INTERSECT运算符用于找出两个SELECT语句结果集的交集。它的语法如下：

postgres=# select  customer_id from customers intersect select customer_id from orders order by customer_id asc;
 customer_id
-------------
           1
           2
           3
(3 rows)

这个类似于inner join，找到有订单的客户id列表；

总结

相同之处是，它们都是对结果集进行操作；

但是有明显的区别，join是将多表进行联接，产生结果集，然后再通过where等条件在联接后的结果集上再过滤；
而union并不限于表与表之间，而是对不同查询结果集，再进行集合操作，而且对于最终结果的列有要求，必须参与的集合列数量和类型要相同；

结尾

非常感谢大家的支持，在浏览的同时别忘了留下您宝贵的评论，如果觉得值得鼓励，请点赞，收藏，我会更加努力！

作者邮箱：[email protected]
如有错误或者疏漏欢迎指出，互相学习。

注：未经同意，不得转载！