我在Stack Overflow上偶然发现了一个非常有趣的问题,关于如何使用jOOQ的MULTISET
操作符来嵌套一个集合,然后通过该嵌套集合是否包含一个值来过滤结果。
这个问题是针对jOOQ的,但是想象一下,你有一个查询,在PostgreSQL中使用JSON嵌套集合。假设,像往常一样,Sakila数据库。现在,PostgreSQL不支持SQL标准的MULTISET
操作符,但我们可以使用ARRAY
,其工作方式几乎相同:
SELECT
f.title,
ARRAY(
SELECT ROW(
a.actor_id,
a.first_name,
a.last_name
)
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
WHERE fa.film_id = f.film_id
ORDER BY a.actor_id
)
FROM film AS f
ORDER BY f.title
这就产生了所有的电影和他们的演员,如下所示(为了便于阅读,我把数组截断了,你明白了吧):
title |array
---------------------------+--------------------------------------------------------------------------------------
ACADEMY DINOSAUR |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)"
ACE GOLDFINGER |{"(19,BOB,FAWCETT)","(85,MINNIE,ZELLWEGER)","(90,SEAN,GUINESS)","(160,CHRIS,DEPP)"}
ADAPTATION HOLES |{"(2,NICK,WAHLBERG)","(19,BOB,FAWCETT)","(24,CAMERON,STREEP)","(64,RAY,JOHANSSON)","(1
AFFAIR PREJUDICE |{"(41,JODIE,DEGENERES)","(81,SCARLETT,DAMON)","(88,KENNETH,PESCI)","(147,FAY,WINSLET)"
AFRICAN EGG |{"(51,GARY,PHOENIX)","(59,DUSTIN,TAUTOU)","(103,MATTHEW,LEIGH)","(181,MATTHEW,CARREY)"
AGENT TRUMAN |{"(21,KIRSTEN,PALTROW)","(23,SANDRA,KILMER)","(62,JAYNE,NEESON)","(108,WARREN,NOLTE)",
AIRPLANE SIERRA |{"(99,JIM,MOSTEL)","(133,RICHARD,PENN)","(162,OPRAH,KILMER)","(170,MENA,HOPPER)","(185
AIRPORT POLLOCK |{"(55,FAY,KILMER)","(96,GENE,WILLIS)","(110,SUSAN,DAVIS)","(138,LUCILLE,DEE)"}
ALABAMA DEVIL |{"(10,CHRISTIAN,GABLE)","(22,ELVIS,MARX)","(26,RIP,CRAWFORD)","(53,MENA,TEMPLE)","(68,
现在,Stack Overflow上的问题是,如何通过ARRAY
(或MULTISET
)是否包含一个特定的值来过滤这个结果。
过滤ARRAY
我们不能只是在查询中添加一个WHERE
子句。由于SQL的逻辑操作顺序,WHERE
子句 "发生在 "SELECT
子句之前,所以ARRAY
还不能用于WHERE
。然而,我们可以把所有的东西都包在一个派生表里,然后这样做:
SELECT *
FROM (
SELECT
f.title,
ARRAY(
SELECT ROW(
a.actor_id,
a.first_name,
a.last_name
)
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
WHERE fa.film_id = f.film_id
ORDER BY a.actor_id
) AS actors
FROM film AS f
) AS f
WHERE actors @> ARRAY[(
SELECT ROW(a.actor_id, a.first_name, a.last_name)
FROM actor AS a
WHERE a.actor_id = 1
)]
ORDER BY f.title
请原谅这个笨重的ARRAY @> ARRAY
操作符。我不知道这里有什么更好的方法,因为在PostgreSQL中很难解除结构类型的RECORD[]
数组的嵌套,如果我们不使用名义类型(CREATE TYPE ...
)。如果你知道一个更好的过滤方法,请在评论区告诉我。这里有一个更好的版本:
SELECT *
FROM (
SELECT
f.title,
ARRAY(
SELECT ROW(
a.actor_id,
a.first_name,
a.last_name
)
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
WHERE fa.film_id = f.film_id
ORDER BY a.actor_id
) AS actors
FROM film AS f
) AS f
WHERE EXISTS (
SELECT 1
FROM unnest(actors) AS t (a bigint, b text, c text)
WHERE a = 1
)
ORDER BY f.title
无论如何,这产生了预期的结果:
title |actors
---------------------+-------------------------------------------------------------------------------------------------
ACADEMY DINOSAUR |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)","(40,JOHNN
ANACONDA CONFESSIONS |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(22,ELVIS,MARX)","(150,JAYNE,NOLTE)","(164,HUMPHREY
ANGELS LIFE |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(7,GRACE,MOSTEL)","(47,JULIA,BARRYMORE)","(91,CHRIS
BULWORTH COMMANDMENTS|{"(1,PENELOPE,GUINESS)","(65,ANGELA,HUDSON)","(124,SCARLETT,BENING)","(173,ALAN,DREYFUSS)"}
CHEAPER CLYDE |{"(1,PENELOPE,GUINESS)","(20,LUCILLE,TRACY)"}
COLOR PHILADELPHIA |{"(1,PENELOPE,GUINESS)","(106,GROUCHO,DUNST)","(122,SALMA,NOLTE)","(129,DARYL,CRAWFORD)","(163,CH
ELEPHANT TROJAN |{"(1,PENELOPE,GUINESS)","(24,CAMERON,STREEP)","(37,VAL,BOLGER)","(107,GINA,DEGENERES)","(115,HARR
GLEAMING JAWBREAKER |{"(1,PENELOPE,GUINESS)","(66,MARY,TANDY)","(125,ALBERT,NOLTE)","(143,RIVER,DEAN)","(155,IAN,TANDY
现在,所有的结果都保证是'PENELOPE GUINESS'
是ACTOR
的影片。但是否有更好的解决方案?
使用ARRAY_AGG代替
然而,在本地PostgreSQL中,使用ARRAY_AGG
,我认为会更好(在这种情况下):
SELECT
f.title,
ARRAY_AGG(ROW(
a.actor_id,
a.first_name,
a.last_name
) ORDER BY a.actor_id) AS actors
FROM film AS f
JOIN film_actor AS fa USING (film_id)
JOIN actor AS a USING (actor_id)
GROUP BY f.title
HAVING bool_or(true) FILTER (WHERE a.actor_id = 1)
ORDER BY f.title
这产生了完全相同的结果:
title |actors
---------------------+------------------------------------------------------------------------------------------------
ACADEMY DINOSAUR |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)","(40,JOHN
ANACONDA CONFESSIONS |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(22,ELVIS,MARX)","(150,JAYNE,NOLTE)","(164,HUMPHRE
ANGELS LIFE |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(7,GRACE,MOSTEL)","(47,JULIA,BARRYMORE)","(91,CHRI
BULWORTH COMMANDMENTS|{"(1,PENELOPE,GUINESS)","(65,ANGELA,HUDSON)","(124,SCARLETT,BENING)","(173,ALAN,DREYFUSS)"}
CHEAPER CLYDE |{"(1,PENELOPE,GUINESS)","(20,LUCILLE,TRACY)"}
COLOR PHILADELPHIA |{"(1,PENELOPE,GUINESS)","(106,GROUCHO,DUNST)","(122,SALMA,NOLTE)","(129,DARYL,CRAWFORD)","(163,C
ELEPHANT TROJAN |{"(1,PENELOPE,GUINESS)","(24,CAMERON,STREEP)","(37,VAL,BOLGER)","(107,GINA,DEGENERES)","(115,HAR
GLEAMING JAWBREAKER |{"(1,PENELOPE,GUINESS)","(66,MARY,TANDY)","(125,ALBERT,NOLTE)","(143,RIVER,DEAN)","(155,IAN,TAND
它是如何工作的?
- 我们通过
FILM
进行分组,并将每部影片的内容汇总到一个嵌套的集合中。 - 我们现在可以用
HAVING
来过滤分组。 BOOL_OR(TRUE)
是 ,只要 ,就是非空的。TRUE
GROUP
FILTER (WHERE a.actor_id = 1)
是那个过滤标准,我们把它放在组中
所以,如果至少有一个ACTOR_ID = 1
,HAVING
谓词就是TRUE
,否则就是NULL
,这与FALSE
的效果相同。如果你是一个纯粹的人,可以把谓词包在COALESCE(BOOL_OR(...), FALSE)
聪明还是整洁,还是两者都有?
用jOOQ做这个
这是jOOQ的版本,可以在任何支持MULTISET_AGG
的RDBMS上使用(ARRAY_AGG
的仿真仍在进行中):
ctx.select(
FILM_ACTOR.film().TITLE,
multisetAgg(
FILM_ACTOR.actor().ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME))
.from(FILM_ACTOR)
.groupBy(FILM_ACTOR.film().TITLE)
.having(boolOr(trueCondition())
.filterWhere(FILM_ACTOR.actor().ACTOR_ID.eq(1)))
.orderBy(FILM_ACTOR.film().TITLE)
.fetch();
虽然强大的MULTISET
值构造器得到了jOOQ用户的大部分赞誉,但我们不要忘记还有一个功能稍差,但偶尔真的很有用的MULTISET_AGG
聚合函数,它可以用于聚合或作为一个窗口函数使用