Foreword:
The database is an important component in the project, and it is also a basic and important component. Its status is the first. I think there should not be too many problems.
Well, it goes without saying that these aspects of database design are the key first step. It mainly involves the deployment method and operation method of the database, the logical design of the table, reasonable fields, reasonable indexes, necessary roles, and security aspects. Consider functions, views, triggers, materialized views, etc. In other words, which data in the project needs to be stored in the database, and how to store related data in the database are issues that need to be solved during the database design phase.
After the database design phase is completed, we enter the operation phase of the database. Before operation, we need to clarify the state that the database should reach. Simply put, the database can have three highs (the three highs usually refer to high availability, high performance, and high concurrency). High availability is relatively easy to achieve. Generally, high availability (that is, HA) is achieved by building a cluster. It is also relatively easy to determine whether high availability has been achieved (after all, after the cluster is set up, how many times does the master-slave switchover occur? You will know whether it is indeed highly available at the first time), but high performance and high concurrency require repeated testing and combined with actual post-line operations to judge. If there are no tests and related test reports, then it is impossible to confirm whether the database has high availability. Performance and high concurrency.
Therefore, database testing is a critical but often overlooked task. For postgresql, there are many tools that can be used to test and determine whether a database meets our expectations, such as pg_profile, pg_reset, pg_stat and other internal or external plug-ins to collect and monitor the database, but the reports generated by these tools There is a lot of content and the generation efficiency is not high, nor is it particularly intuitive.
The tool pgbench can solve a large part of the pain points. The tool has a simple, direct, efficient and easy-to-use database testing process. The key is that there is no need for special installation and deployment. It is a small tool that comes with the postgresql database. It gives the impression that pgbench is equivalent to the ab tool in web testing, which is very convenient to use.
pgbench can be used to test the performance and concurrency capabilities of PostgreSQL. It simulates a simple bank transfer scenario and can simulate different loads by setting parameters. pgbench supports multi-threaded concurrency testing and can test indicators such as transaction throughput, latency, and number of concurrent connections. pgbench is simple to use, but its functionality is limited and can only perform basic load testing.
The following is a brief introduction to the use of pgbench.
one,
Where is pgbench?
pgbench is generally a built-in command installed with the database
Special note that this command is basically the same as other commands. It needs to be executed by the postgres user. It cannot be used by the root user.
[root@node1 ~]# whereis pgbench
pgbench: /usr/local/pgsql/bin/pgbench
two,
Introduction to database for testing
The operating system is centos7, two VMware virtual machines, 4G memory, and 4 CPU cores
The database uses the postgresql-12.4 version, all of which are in the default state, which means that there is no optimization. The optimization here refers to the operating parameters of the database and the optimization of the operating system kernel parameters. The database is a simple master-slave replication cluster.
Master database IP 192.168.123.11
From database IP 192.168.123.12
three,
Data preparation for test work
It is planned to generate a large table with 20 million entries, and then perform queries and write tests on the table to obtain the performance and concurrency indicators of the database. The following is the code to create the large table:
Random number function:
create or replace function gen_id(
a date,
b date
)
returns text as $$
select lpad((random()*99)::int::text, 3, '0') ||
lpad((random()*99)::int::text, 3, '0') ||
lpad((random()*99)::int::text, 3, '0') ||
to_char(a + (random()*(b-a))::int, 'yyyymmdd') ||
lpad((random()*99)::int::text, 3, '0') ||
random()::int ||
(case when random()*10 >9 then 'xy' else (random()*9)::int::text end ) ;
$$ language sql strict;
Create test table structure:
CREATE SEQUENCE test START 1;
create table if not exists testpg (
"id" int8 not null DEFAULT nextval('test'::regclass),
CONSTRAINT "user_vendorcode_pkey" PRIMARY KEY ("id"),
"suijishuzi" VARCHAR ( 255 ) COLLATE "pg_catalog"."default"
);
Insert 2000W pieces of data:
Depending on machine performance, it takes about 5 to 10 minutes
insert into testpg SELECT generate_series(1,20000000) as xm, gen_id('1949-01-01', '2023-10-16') as num;
Four,
View test table
five,
pgbench initialization
Note that before initialization, you need to create the pgbench database. How to create it is not nonsense here.
pgbench -U postgres -i pgbench
After creation, you will see several tables under the pgbench database. The functions of the tables are not yet clear:
postgres=# \c pgbench
You are now connected to database "pgbench" as user "postgres".
pgbench=# \dt
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+----------
public | pgbench_accounts | table | postgres
public | pgbench_branches | table | postgres
public | pgbench_history | table | postgres
public | pgbench_tellers | table | postgres
(4 rows)
six,
Several modes of pgbench
pgbench has two types: built-in mode and external mode. Built-in is to directly test the contents of the four tables just created by pgbench. It is generally used for benchmark testing (benchmark testing refers to basic and accuracy tests). The external mode uses customized SQL statements for testing, and is generally used for stress performance testing.
Built-in modes:
There are three specific refinements of the built-in mode. Based on the name, I can probably guess that the first is a simple test of comprehensive performance, the second is a simple test of write performance, and the third is a simple test of read performance. They are all Tested using the four tables that come with pgbench and its own logic.
[postgres@node1 ~]$ pgbench -b list
Available builtin scripts:
tpcb-like
simple-update
select-only
The first small mode (tpcb-like):
pgbench -U postgres -T 10 -c 10 -h 192.168.123.11 -d pgbench > 1111.txt 2>&1 >>1111.txt
Intercepting part of the output results, you can see that pgbench has update, insert, and select actions, which are all completed in the above four tables. This process is uncontrollable and is basically not a very accurate test.
client 5 executing script "<builtin: TPC-B (sort of)>"
client 5 executing \set aid
client 5 executing \set bid
client 5 executing \set tid
client 5 executing \set delta
client 5 sending BEGIN;
client 5 receiving
client 0 receiving
client 0 sending END;
client 0 receiving
client 5 receiving
client 5 sending UPDATE pgbench_accounts SET abalance = abalance + -1444 WHERE aid = 99838;
client 5 receiving
client 9 receiving
client 9 sending UPDATE pgbench_tellers SET tbalance = tbalance + -1294 WHERE tid = 6;
client 9 receiving
client 0 receiving
client 5 receiving
client 5 sending SELECT abalance FROM pgbench_accounts WHERE aid = 99838;
client 5 receiving
client 8 receiving
client 8 sending INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (1, 1, 78380, -2573, CURRENT_TIMESTAMP);
client 8 receiving
client 0 executing script "<builtin: TPC-B (sort of)>"
client 0 executing \set aid
client 0 executing \set bid
client 0 executing \set tid
client 0 executing \set delta
client 0 sending BEGIN;
client 0 receiving
client 0 receiving
client 0 sending UPDATE pgbench_accounts SET abalance = abalance + -2452 WHERE aid = 40167;
client 0 receiving
client 5 receiving
client 5 sending UPDATE pgbench_tellers SET tbalance = tbalance + -1444 WHERE tid = 10;
client 5 receiving
client 8 receiving
client 8 sending END;
client 8 receiving
client 5 receiving
client 5 sending UPDATE pgbench_branches SET bbalance = bbalance + -1444 WHERE bid = 1;
client 5 receiving
The second small mode (select-only);
pgbench -U postgres -b select-only -c 10 -h 192.168.123.11 -d pgbench > 1111.txt 2>&1 >>1111.txt
The third small mode ( simple-update )
pgbench -U postgres -b simple-update -c 10 -h 192.168.123.11 -d pgbench > 1111.txt 2>&1 >>1111.txt
External mode:
pgbench -M prepared -v -r -P 1 -f ./ro.sql -c 60 -j 60 -T 120 -D scale=10000 -D range=500000 -Upostgres test -P 5 -h 192.168.123.222 -p 15433
seven,
Parameter description of pgbench command:
Parameter Description:
-r After the benchmark ends, report the average per-statement wait time (execution time from the client's perspective) for each command.
-j Number of worker threads in pgbench. Using more than one thread can be useful on multi-CPU machines. Clients are distributed as evenly as possible across available threads. Default is 1.
-c The number of simulated clients, that is, the number of concurrent database sessions. Default is 1.
-t Number of transactions to run per client. Default is 10.
-T runs the test for this many seconds instead of running a fixed number of transactions per client.
-D VARNAME = VALUE passes the variable value in the test script
define variable for use by custom script
- v vacuum all four standard tables before tests . Generally , in order to remove the impact of the last test results during testing, you need to vacuum the pgbench database.
Report Description:
transaction type indicates the test type used in this test
scaling factor indicates the scaling factor for the amount of data set by pgbench during initialization
query mode indicates the specified query mode, including simple query mode (default), extended query mode and prepared query mode
number of clients indicates the number of specified client connections
number of threads indicates the number of threads for each client during testing
number of transactions actually processed The number of transactions actually processed at the end of the test
latency average average response time of the test process
tps Number of transactions executed per unit time
To be continued! ! !