一、问题
测试发现 like 使用后缀通配符(%)时不走索引。
--测试表环境
postgres=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
name | text | | |
Indexes:
"test_id_index" btree (id)
"test_name_index" btree (name)
postgres=# select count(*) from test;
count
----------
10000000
(1 row)
--执行计划没走索引
postgres=# explain analyze select * from test where name like 'val:999999%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..107238.33 rows=1000 width=15) (actual time=523.748..524.772 rows=11 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test (cost=0.00..106138.33 rows=417 width=15) (actual time=363.223..517.478 rows=4 loops=3)
Filter: (name ~~ 'val:999999%'::text)
Rows Removed by Filter: 3333330
Planning Time: 0.054 ms
Execution Time: 524.786 ms
(8 rows)
--关闭全表扫后,执行计划仍不走索引
postgres=# set enable_seqscan = off;
SET
postgres=# explain analyze select * from test where name like 'val:999999%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=10000000000.00..10000179055.00 rows=1000 width=15) (actual time=95.642..950.982 rows=11 loops=1)
Filter: (name ~~ 'val:999999%'::text)
Rows Removed by Filter: 9999989
Planning Time: 0.064 ms
Execution Time: 950.998 ms
(5 rows)
二、原因
在 PG 中存在一定的限制:只有在数据库 Collate 为 C 时,like 使用后缀通配符(%)时,才会用到索引。
--上述为 Collate 为 en_US.UTF-8 时的示例,下面看下 为 C 时的示例
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | C | C |
(4 rows)
postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
name | text | | |
Indexes:
"test_id_index" btree (id)
"test_name_index" btree (name)
test=# select count(*) from test;
count
----------
10000000
(1 row)
--执行计划走索引
test=# explain analyze select * from test where name like 'val:999999%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using test_name_index on test (cost=0.43..8.46 rows=1000 width=15) (actual time=0.030..0.041 rows=11 loops=1)
Index Cond: ((name >= 'val:999999'::text) AND (name < 'val:99999:'::text))
Filter: (name ~~ 'val:999999%'::text)
Planning Time: 133.386 ms
Execution Time: 0.061 ms
(5 rows)
三、解决方式
1) 创建索引时,加上对应字段类型的操作符
--重建索引
postgres=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
name | text | | |
Indexes:
"test_id_index" btree (id)
"test_name_index" btree (name)
postgres=# drop index test_name_index;
DROP INDEX
postgres=# create index test_name_index on test(name varchar_pattern_ops);
CREATE INDEX
postgres=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
name | text | | |
Indexes:
"test_id_index" btree (id)
"test_name_index" btree (name varchar_pattern_ops)
--执行计划走索引
postgres=# explain analyze select * from test where name like 'val:999999%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using test_name_index on test (cost=0.43..8.46 rows=1000 width=15) (actual time=0.275..0.286 rows=11 loops=1)
Index Cond: ((name ~>=~ 'val:999999'::text) AND (name ~<~ 'val:99999:'::text))
Filter: (name ~~ 'val:999999%'::text)
Planning Time: 45.914 ms
Execution Time: 0.300 ms
(5 rows)
操作符 text_pattern_ops、varchar_pattern_ops 和 bpchar_pattern_ops 分别支持类型 text、varchar 和 char 上的B树索引。它们与默认操作符类的区别是值的比较是严格按照字符进行而不是根据区域相关的排序规则。这使得这些操作符类适合于当一个数据库没有使用标准“C”区域时被使用在涉及模式匹配表达式(LIKE 或 POSIX 正则表达式)的查询中。详细信息可参考:https://www.postgresql.org/docs/current/indexes-opclass.html
2) 创建索引时指定排序规则
--重建索引
postgres=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
name | text | | |
Indexes:
"test_id_index" btree (id)
"test_name_index" btree (name)
postgres=# drop index test_name_index;
DROP INDEX
postgres=# create index test_name_index on test(name collate "C");
CREATE INDEX
postgres=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
name | text | | |
Indexes:
"test_id_index" btree (id)
"test_name_index" btree (name COLLATE "C")
--执行计划走索引
postgres=# explain analyze select * from test where name like 'val:999999%';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using test_name_index on test (cost=0.43..8.46 rows=1000 width=15) (actual time=0.062..0.071 rows=11 loops=1)
Index Cond: ((name >= 'val:999999'::text) AND (name < 'val:99999:'::text))
Filter: (name ~~ 'val:999999%'::text)
Planning Time: 0.073 ms
Execution Time: 0.085 ms
(5 rows)
四、扩展
PG 支持正则表达式写法。
--正则表达式走索引
postgres=# explain analyze select * from test where name ~ '^val:999999';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using test_name_index on test (cost=0.43..8.46 rows=1000 width=15) (actual time=0.012..0.027 rows=11 loops=1)
Index Cond: ((name ~>=~ 'val:999999'::text) AND (name ~<~ 'val:99999:'::text))
Filter: (name ~ '^val:999999'::text)
Planning Time: 0.217 ms
Execution Time: 0.038 ms
(5 rows)