postgresql CPU usage alarm processing
background
A project frequent business database CPU utilization warning in late February, which appeared one day up to 25 alarms in February 28, particularly in the 15: 35-16: 35 time period of 10 minutes flat line appears limitless close to 100% utilization, the monitoring CPU as follows:
System as follows:
PostgreSQL 10.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
Processing steps
tool:
first step
Use top view the current use of CPU, the CPU usage caused confirm the source of the alarm is Postgresql;
The second step
select *from pg_stat_activity where state not in ('idle') and pid <>pg_backend_pid();
View live process conditions:
- Can top the highest CPU usage of the PID, view the corresponding PID pg_stat_activity specific query statement, of course, can be used directly auto_explain ;
- At the same time which can be judged primarily caused by consumption of which service library;
third step
The second step to be real-time monitoring, may for some reason miss the scene of the accident, it is more difficult to obtain real-time code in question, for this reason, we need to use [
pg_stat_statements
] The plug-in real-time execution collect statistical information;
We can use the following code to see the highest percentage share of the first five statements:
SELECT
round(
(
100 * A .total_time / SUM (A .total_time) OVER ()
) :: NUMERIC,
2
) percent,
--a.dbid,
b.datname,
round(A .total_time :: NUMERIC, 2) AS total,
A .calls,
round(A .mean_time :: NUMERIC, 2) AS mean,
A .query
FROM
pg_stat_statements A
INNER JOIN pg_stat_database b ON A .dbid = b.datid
ORDER BY
total_time DESC
LIMIT 5;
By the above-described step 3 can be positioned to the following issues:
SELECT
K .lineid AS lineid,
K .linename AS linename,
K .linestatus AS linestatus,
K .userid AS userid,
K .account AS account,
K .tn_saleareaid AS tn_saleareaid,
K .tn_linetype AS tn_linetype,
K .tn_weekday AS tn_weekday,
K .tn_createtime AS tn_createtime,
SUM (
CASE
WHEN strpos(kx.customertype, '终端') > 0 THEN
1
ELSE
0
END
) AS storecount,
SUM (
CASE
WHEN strpos(kx.customertype, '渠道') > 0 THEN
1
ELSE
0
END
) AS channelcount,
SUM (
CASE
WHEN kx.customertype != '' THEN
1
ELSE
0
END
) AS customercount,
CASE K .linestatus
WHEN 1 THEN
'启用'
ELSE
'停用'
END AS linestatustext,
K .tn_linetype,
CASE
WHEN K .tn_weekday = '0' THEN
'星期天'
WHEN K .tn_weekday = '1' THEN
'星期一'
WHEN K .tn_weekday = '2' THEN
'星期二'
WHEN K .tn_weekday = '3' THEN
'星期三'
WHEN K .tn_weekday = '4' THEN
'星期四'
WHEN K .tn_weekday = '5' THEN
'星期五'
WHEN K .tn_weekday = '6' THEN
'星期六'
ELSE
'无'
END AS tn_weekdaytext,
K .tn_weekday
FROM
kx_visit_line AS K
LEFT JOIN kx_visit_linecustomer AS kx ON kx.lineid = K .lineid
AND kx.platstatus = 1
WHERE
1 = 1
AND NOT EXISTS (
SELECT
ID
FROM
(
SELECT
kx_kq_store. ID,
kx_kq_store.status
FROM
kx_kq_store AS kx_kq_store
WHERE
kx_kq_store.presentative LIKE '%' || '1215946224222474240' || '%'
AND kx_kq_store.platstatus = 1
UNION ALL
SELECT
ka_kq_channelcustomers. ID,
ka_kq_channelcustomers.status
FROM
ka_kq_channelcustomers AS ka_kq_channelcustomers
WHERE
ka_kq_channelcustomers.bizmanager LIKE '%' || '1215946224222474240' || '%'
AND ka_kq_channelcustomers.platstatus = 1
) s
WHERE
s. ID = kx.customerid
AND s.status != 1
)
AND K .userid = '1215946224222474240'
AND K .tn_linetype = '1'
AND K .platstatus = 1
GROUP BY
K .linename,
K .lineid,
K .tn_linetype
ORDER BY
K .tn_createtime DESC
LIMIT 20 OFFSET 0
Corresponding implementation plan are as follows:
Table kx_visit_line [], [kx_kq_store], [] and [ka_kq_channelcustomers kx_visit_linecustomer] caused by a missing index, the new index follows the maintenance window;
CREATE INDEX CONCURRENTLY IX_kx_kq_store_presentative ON kx_kq_store USING gin (presentative gin_trgm_ops);
CREATE INDEX CONCURRENTLY ix_kx_visit_line_ ON kx_visit_line (
userid,
tn_linetype,
platstatus
);
CREATE INDEX CONCURRENTLY ix_kx_visit_linecustomer_platstatus ON kx_visit_linecustomer (platstatus);
CREATE INDEX CONCURRENTLY ix_ka_kq_channelcustomers_presentative ON ka_kq_channelcustomers USING gin (bizmanager gin_trgm_ops);
Optimized implementation of the results is as follows:
After a review of last week's adjustment monitoring system performance is as follows:
As can be seen from FIG CPU usage more significant improvement has occurred, but there are individual burst transient spikes follow the Sustainable according
pg_stat_statements
analysis of statistical information;
to sum up
Missing indexes can cause CPU resource bottleneck;