postgresql-indexing and optimization

Introduction to indexing
Index type
Create index
View index
Maintain index
Delete index

Introduction to indexing

Index (Index) can be used to improve the query performance of the database; however, the index also requires reading and writing, and it also takes up
more storage space; therefore, understanding and properly utilizing indexes is crucial for database optimization. In this article, we will introduce how to
use PostgreSQL indexes efficiently.

-- 创建表
CREATE TABLE test (
 id integer,
 name text
);
-- generate_series 产生序列
INSERT INTO test
SELECT v,'val:'||v FROM generate_series(1, 10000000) v;

SELECT name FROM test WHERE id = 10000;

Without an index, the database would need to scan the entire table to find the corresponding data. You can use the EXPLAIN command to see
the execution plan of the database, which is the specific steps for PostgreSQL to execute SQL statements.
Execution plan reference document

-- Parallel Seq Scan 表示并行顺序扫描，执行消耗了大量时间
--；由于表中有包含大量数据，而查询只返回一行数据，显然这种方法效率很低。
explain analyze
SELECT name FROM test WHERE id = 10000;

Insert image description here

--如果在 id 列上存在索引，则可以通过索引快速找到匹配的结果。我们先创建一个索引：
CREATE INDEX test_id_index ON test (id);

-- 创建索引之后，再次查看数据库的执行计划
explain analyze
SELECT name FROM test WHERE id = 10000;

Insert image description here
Index Scan represents an index scan, and the execution consumes 1.3ms; this method is similar to the keyword index at the end of the book.
Readers can browse the index relatively quickly and turn to the appropriate page without having to read the entire book to find the interesting page. content.

Indexes can not only optimize query statements, but some UPDATE and DELETE statements containing WHERE conditions can also
use indexes to improve performance, because the prerequisite for modifying data is to find the data.

In addition, indexes can also be used to optimize join queries. Creating indexes based on fields in join conditions can improve the
performance of join queries. Indexes can even optimize grouping or sorting operations because the index itself is organized and stored sequentially.

On the other hand, the system requires a certain price to maintain the index, thereby increasing the burden of data modification operations. Therefore, we
need to create indexes reasonably, and generally only create indexes for frequently used fields. Just like a book, it's impossible
to create an index for every keyword in the book

Index type

PostgreSQL supports several index types: B-tree, hash, GiST, SP-GiST, GIN, and BRIN indexes. Each
index is based on a different storage structure and algorithm and is used to optimize different types of queries. By default, PostgreSQL creates
B-tree indexes because it is suitable for most queries

B-tree index

B-tree is a self-balancing tree that stores data in sequence and supports search, insertion, deletion and sequential access with logarithmic time complexity (O(logN)) .
The PostgreSQL optimizer considers B-tree indexes for the following comparison operators on index columns:
• <
• <=
• =
• >=
• BETWEEN
• IN
• IS NULL
• IS NOT NULL
In addition, if the pattern matching operator LIKE The beginning of the pattern in and~ is not a wildcard character, and the optimizer can also use B-tree indexes
, for example:

col LIKE 'foo%'
col ~ '^foo'

For the size-insensitive ILIKE and ~* operators,
B-tree indexes can also be used if the matching pattern begins with a non-alphabetic character (not affected by case conversion).
B-tree indexes can also be used to optimize sorting operations, such as:

SELECT col1, col2
 FROM t
WHERE col1 BETWEEN 100 AND 200
ORDER BY col1;

The index on col1 can not only optimize query conditions, but also avoid additional sorting operations; because when accessing based on this index,
the results are returned in sorted order.

Hash index

Hash index (Hash index) can only be used for simple equal value search (=), which means that the index field is used for equal sign condition
judgment. Because the original size relationship is no longer retained after hashing the data.
Creating a hash index requires the use of the HASH keyword:

-- CREATE INDEX 语句用于创建索引，USING 子句指定索引的类型
CREATE INDEX index_name
ON table_name USING HASH (column_name);

GiST index

GiST stands for Generalized Search Tree. GiST indexes a single index type, but is a
framework that supports different indexing strategies. Common uses of GiST indexes include indexing of geometric data and full-text search. GiST indexes can also
be used to optimize "nearest neighbor" searches, e.g.

-- 该语句用于查找距离某个目标地点最近的 10 个地方。
SELECT *
FROM places
ORDER BY location <-> point '(101,456)'
LIMIT 10;

SP-GiST index

SP-GiST stands for spatial partitioning GiST, which is mainly used for indexing data such as GIS, multimedia, telephone routing, and IP routing.
Similar to GiST, SP-GiST also supports "nearest neighbor" search

GIN index

GIN stands for generalized inverted indexes and is mainly used for
data containing multiple values in a single field, such as hstore, array, jsonb and range data types. An inverted index creates a separate
index entry for each element value, allowing you to efficiently query whether a specific element value exists. Search engines such as Google and Baidu use
inverted indexes.

BRIN index

BRIN stands for block range indexes, which stores data summary information within continuous physical range intervals.
BRIN is also much smaller than B-tree indexes and easier to maintain.
For very large tables that cannot use B-tree indexes without horizontal partitioning , BRIN can be considered.
BRIN is usually used for fields with a linear sort order, such as the creation date of the order table.
Introduction to indexes on the postgresql official website

Create index

PostgreSQL uses the CREATE INDEX statement to create a new index:

CREATE INDEX index_name ON table_name
[USING method]
(column_name [ASC | DESC] [NULLS FIRST | NULLS LAST]);

index_name is the name of the index, table_name is the name of the table;
• method indicates the type of index, such as btree, hash, gist, spgist, gin or brin. The default is btree;
• column_name is the field name, ASC represents ascending order (default value), DESC represents descending index;
• NULLS FIRST and NULLS LAST represent the sort order of null values in the index. The default is NULLS
LAST for ascending index and the default for descending index. is NULLS FIRST.
Insert image description here

unique index

When creating an index, you can use the UNIQUE keyword to specify a unique index:

CREATE UNIQUE INDEX index_name
ON table_name (column_name [ASC | DESC] [NULLS FIRST | NULLS LAST]);

Unique indexes can be used to implement unique constraints. PostgreSQL currently only supports B-tree type unique indexes. Multiple NULLs
are considered different values, so unique index fields can have multiple null values.
For primary keys and unique constraints, PostgreSQL will automatically create a unique index to ensure uniqueness.

multi-column index

CREATE [UNIQUE] INDEX index_name ON table_name
[USING method]
(column1 [ASC | DESC] [NULLS FIRST | NULLS LAST], ...);

For multi-column indexes, the fields most commonly used as query conditions should be placed on the left, and the less commonly used fields should be placed on the right.
For example, an index created based on (c1, c2, c3) can optimize the following query:

WHERE c1 = v1 and c2 = v2 and c3 = v3;
WHERE c1 = v1 and c2 = v2;
WHERE c1 = v1;

But the following query cannot use this index:

WHERE c2 = v2;
WHERE c3 = v3;
WHERE c2 = v2 and c3 = v3;

For a multi-column unique index, the combined values of the fields cannot be repeated; but if a field is a null value, duplicate values can appear in other fields
.

function index

Function index, also called expression index, refers to an index created based on the value of a function or expression.
The syntax for creating a functional index in PostgreSQL is as follows

CREATE [UNIQUE] INDEX index_name
ON table_name (expression);

expression is a field-based expression or function.
The following query uses the upper function on the name field:
Insert image description here
虽然 name 字段上存在索引 test_name_index，但是函数会导致优化器无法使用该索引. In order to
optimize this case-insensitive query statement, you can基于 name 字段创建一个函数索引

drop index test_name_index;
create index test_name_index on test(upper(name));

Check the execution plan of this statement again:
Insert image description here
the maintenance cost of functional indexes is relatively high because function calculations are required for both insertion and update.

partial index

A partial index is an index created for only part of the data rows in the table. The rows that
need to be indexed are specified through a WHERE clause. For example, for the orders table orders, most of the orders are in the completed state; we only need to
query and track the uncompleted orders, and we can create a partial index.

-- 创建表
create table orders(order_id int primary key, order_ts timestamp, finished
boolean);

-- 创建索引
create index orders_unfinished_index
on orders (order_id)
WHERE finished is not true;

This index only contains unfinished order IDs, which is much smaller than the index created directly based on the finished field. It can be used to
optimize queries for outstanding orders:

explain analyze
select order_id
from orders
where finished is not true;

Insert image description here

covering index

Indexes in PostgreSQL are all secondary indexes, which means that the index and data are stored separately. Therefore, searching for data through the index
requires accessing both the index and the table, and table access is random. In I/O
order to solve this performance problem, PostgreSQL supports Index-Only Scan, which only needs to access the index data to
obtain required results. There is no need to access the data in the table again. For example

-- 创建表
create table t (a int, b int, c int);
-- 创建唯一索引
create unique index idx_t_ab on t using btree (a, b) include (c);

The above statement creates a multi-column index based on fields a and b, and uses INCLUDE to store
the value of field c in the leaf node of the index. The following queries can take advantage of Index-Only Scan:

explain analyze
select a, b, c
from t
where a = 100 and b = 200;

Insert image description here
The above query only returns the index fields (a, b) and the covered field (c), and the results can be returned by just scanning the index.
B-tree indexes support Index-Only Scan. GiST and SP-GiST indexes support Index-Only Scan for certain operators.
Other indexes do not support this method.

View index

PostgreSQL provides a view about indexes pg_indexesthat can be used to view index information:

-- 该视图包含的字段依次为：模式名、表名、索引名、表空间以及索引的定义语句。
select * from pg_indexes where tablename = 'test';

Insert image description here

Maintain index

PostgreSQL provides several methods for modifying and rebuilding indexes:

ALTER INDEX index_name RENAME TO new_name;
ALTER INDEX index_name SET TABLESPACE tablespace_name;
REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } index_name;

The two ALTER INDEX statements are used to rename the index and move the index to other table spaces respectively; REINDEX is used to rebuild
index data and supports different levels of index rebuilding.

In addition, after the index is created, the system will automatically update the index while modifying the data. However, we need to regularly execute
the ANALYZE command to update the database statistics so that the optimizer can use the index reasonably.

Delete index

DROP INDEX index_name [ CASCADE | RESTRICT ];

CASCADE 表示级联删除其他依赖该索引的对象；RESTRICT 表示如果存在依赖于该索引的对象，将会拒绝删除操作。默认为 RESTRICT

--可以使用以下语句删除 test 上的索引：
drop index test_id_index, test_name_index;

postgresql-indexing and optimization

postgresql-indexing and optimization

Introduction to indexing

Index type

B-tree index

Hash index

GiST index

SP-GiST index

GIN index

BRIN index

Create index

unique index

multi-column index

function index

partial index

covering index

View index

Maintain index

Delete index

Guess you like