记一次死锁问题排查过程

一 背景

​ 某天业务运维人员反馈,在前一天的某一时刻,业务在1分钟内业务交易超时率上升,1分钟后恢复正常。想让我们在数据库层面做个分析,查看是不是当时数据库出现异常导致业务缓慢。

二 分析过程

​ 通过ash(awr)对当时的等待事件统计情况进行查询:

​ ASH:

select ash.sample_time, awr.event, count(*)
  from v$active_session_history ash
 where to_char(ash.sample_time, 'yyyy-mm-dd hh24:mi') between '&start_time' and
       '&end_time'
 group by ash.sample_time, ash.event
 order by 1;

​ AWR:

select awr.sample_time, awr.event, count(*)
  from dba_hist_active_sess_history awr
 where to_char(awr.sample_time, 'yyyy-mm-dd hh24:mi') between '&start_time' and
       '&end_time'
 group by awr.sample_time, awr.event
 order by 1;

在这里插入图片描述

​ 从数据库中查看,在13:44:27的时候,存在少量的行锁竞争。

​ 缩短时间范围,对13:44:00-13:45:00进行分析,查看具体信息:

​ ASH:

 select ash.sample_time,
        ash.event,
        sql_id,
        ash.BLOCKING_SESSION,
        session_id,
        count(*)
   from v$active_session_history ash
  where to_char(ash.SAMPLE_TIME, 'yyyy-mm-dd hh24:mi:ss') between
        '&time_start' and '&time_end'
  group by ash.sample_time,
           ash.event,
           ash.sql_id,
           ash.BLOCKING_SESSION,
           ash.SESSION_ID
  order by 1;

​ AWR:

 select awr.sample_time,
        awr.event,
        sql_id,
        awr.BLOCKING_SESSION,
        session_id,
        count(*)
   from dba_hist_active_sess_history awr
  where to_char(ash.SAMPLE_TIME, 'yyyy-mm-dd hh24:mi:ss') between
        '&time_start' and '&time_end'
  group by awr.sample_time,
           awr.event,
           awr.sql_id,
           awr.BLOCKING_SESSION,
           awr.SESSION_ID
  order by 1;

image.png

​ 从上图中可以看到等待行锁竞争的主要有两个sql,经过查看,这两个sql的文本主要是两个for update操作。

​ 但是单从行锁争用上查看,我们只能看到此时这些会话是处于等待获取行锁的状态,即为waiter。因此,我们需要查看blocking_session字段,该字段可以展示出是谁阻塞了当前的会话:

image.png

​ 从blocking_session字段中可以看到,这些会话的阻塞者,即blocker有两个,分别是sid=8470的会话和sid=1719的会话;为了分析出阻塞链,我们则需要通过session_id和blocking_session结合进行分析。

​ 由于从blocking_session中查看,主要的阻塞者是8470,那么我们就看看8470是在干什么?查看session_id=8470的行,可以发现,session_id=8470的行,也存在blocker,而阻塞8470的blocker是1719会话。那么1719会话又在干什么呢?再次查看session_id=1719的会话,可以发现1719的会话又被8470的会话阻塞。

​ 从上述阻塞链分析中可以发现,阻塞链形成了一个环,即 1719 <----> 8470,8470—>4、1710、2650等会话;因此,初步判断此时数据库产生了死锁的情况。

仔细对alter日志进行查看,发现在日志中有非常小的一段提示:

image.png

​ 这两句话在正常的日志中仅作为提示很难发现,但是通过交易量超时率增高的时间点以及日志时间比对,可以确定,13:44分的时候确实发生了死锁的情况。而改死锁的详细信息则存放在了lmd0的trc文件中。

​ 我们知道,在oracle rac数据库中,LMD(Lock Manager Daemon)进程是负责全局队列服务的进程,负责每个RAC实例来自远端RAC节点的资源请求,因此该进程也统一管理各个节点之间的锁资源的。所以,在RAC环境中,死锁的详细信息也会放到lmd的trace文件中。

​ 因此,根据提示,我们则可以对lmd0进行进一步分析:

*** 2022-06-09 13:44:32.124

--这里首先DUMP了本地BLOCKER,而[0x58001f][0x8083a],[TX][ext 0x2,0x0]是1节点的的事物的XID详细信息;
DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0x58001f][0x8083a],[TX][ext 0x2,0x0]

--这里主要DUMP除了详细的资源信息,其中resname后表示事务的XID信息,[0x58001f][0x8083a]分别表示undo段的slot和seq;
----------resource 7000113353b8108----------------------
resname       : [0x58001f][0x8083a],[TX][ext 0x2,0x0]
hash mask     : x3
Local inst    : 1
dir_inst      : 1
master_inst   : 1
hv idx        : 56
hv last r.inc : 53
current inc   : 53
hv status     : 0
hv master     : 0
open options  : dd 
grant_bits    : KJUSERNL KJUSEREX 
grant mode    : KJUSERNL  KJUSERCR  KJUSERCW  KJUSERPR  KJUSERPW  KJUSEREX
count         : 1         0         0         0         0         1
val_state     : KJUSERVS_NOVALUE
valblk        : 0x000000000000004c0fffffffffffb1a8 .L
access_inst   : 1
vbreq_state   : 0
state         : x0
resp          : 7000113353b8108
On Scan_q?    : N
Total accesses: 3837
Imm.  accesses: 3258
Granted_locks : 1 
Cvting_locks  : 1 
value_block:  00 00 00 00 00 00 00 4c 0f ff ff ff ff ff b1 a8
--该队列持有链表,持有TX锁
GRANTED_Q :
lp 7000113069f8ea8 gl KJUSEREX rp 7000113353b8108 [0x58001f][0x8083a],[TX][ext 0x2,0x0]
  master 1 gl owner 700011352af1648 possible pid 52822572 xid 1020-0209-00048163 bast 0 rseq 114 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > LOC_AST
  open opt KJUSERDEADLOCK  
--转换列表,有一个队列等待持有该tx锁
CONVERT_Q: 
lp 7000113069f9078 gl KJUSERNL rl KJUSEREX rp 7000113353b8108 [0x58001f][0x8083a],[TX][ext 0x2,0x0]
  master 1 gl owner 700011343667d30 possible pid 56427232 xid 100E-00ED-00006E81 bast 0 rseq 114 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
  
--这里开始dump锁7000113069f8ea8的详细信息
----------enqueue 7000113069f8ea8------------------------
lock version     : 1163129
Owner inst       : 1
grant_level      : KJUSEREX
req_level        : KJUSEREX
bast_level       : KJUSERNL
notify_func      : 0
resp             : 7000113353b8108
procp            : 7000113246694e8
pid              : 56427232
proc version     : 21362
oprocp           : 0
opid             : 56427232
group lock owner : 700011352af1648
possible pid     : 52822572
xid              : 1020-0209-00048163
dd_time          : 0.0 secs
dd_count         : 0
timeout          : 0.0 secs
On_timer_q?      : N
On_dd_q?         : N
lock_state       : GRANTED
ast_flag         : 0x0
Open Options     : KJUSERDEADLOCK 
Convert options  : KJUSERNOQUEUE KJUSERNODEADLOCKWAIT 
History          : REF_RES > LOC_AST > CLOSE > FREE > REF_RES > LOC_AST
Msg_Seq          : 0x0
res_seq          : 114
valblk           : 0x0fffffffffffb4002424424000000000 .$$B@

--从这里看出,sid=1719的会话持有0x7000113069f8ea8锁队列,以及该sid的相关信息
user session for deadlock lock 0x7000113069f8ea8
  sid: 1719 ser: 24115 audsid: 529000011 user: 45/XXXXX
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 521 O/S info: user: grid, term: UNKNOWN, ospid: 52822572
    image: oracle@xxxxdb

--可以查到该会话的来源
  client details:
    O/S info: user: xxxxxx, term: unknown, ospid: 1234
    machine: xxxxxx program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
--该会话当前执行的sql
  current SQL:
  select zhanghao, zhhuywzl, zhanghlx, zhanghmc, yewubima, zhnghsbm, zhanghxh, hesuandm, zhyngyjg, huobdaih, shqizhye, shqiyefx, yuexingz, zhanghye, yuefangx, neibzhlx, yegenxms, yejcbioz, dfsgjzxk, jfsgjzxk, kaihuriq, kaihuguy, kaihgyls, zuihjyrq, hexiaorq, hexiaogy, hexiaols, zhanghzt, farendma, weihguiy, weihjigo, weihriqi, weihshij, shijchuo, jiluztai from ktaa_nbfhzh where zhanghao=:1  and farendma=:2  for update
DUMP LOCAL BLOCKER: initiate state dump for DEADLOCK

--说明pid为521,ospid为52822572的进程正在持有7000113353b8108的tx锁资源
  possible owner[521.52822572] on resource TX-0058001F-0008083A

*** 2022-06-09 13:44:32.125
Submitting asynchronized dump request [1c]. summary=[ges process stack dump (kjdglblkrdm1)].

----这里开始dump转换列表中7000113069f9078锁的详细信息
----------enqueue 7000113069f9078------------------------
lock version     : 1353551
Owner inst       : 1
grant_level      : KJUSERNL
req_level        : KJUSEREX
bast_level       : KJUSERNL
notify_func      : 0
resp             : 7000113353b8108
procp            : 7000113246694e8
pid              : 56427232
proc version     : 21362
oprocp           : 0
opid             : 56427232
group lock owner : 700011343667d30
possible pid     : 56427232
xid              : 100E-00ED-00006E81
dd_time          : 10.0 secs
dd_count         : 1
timeout          : 0.0 secs
On_timer_q?      : N
On_dd_q?         : Y
lock_state       : OPENING CONVERTING 
ast_flag         : 0x0
Open Options     : KJUSERDEADLOCK 
Convert options  : KJUSERGETVALUE 
History          : REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
Msg_Seq          : 0x0
res_seq          : 114
valblk           : 0x090000000000634807000113069f8ea8 .cH

--从这里看出,sid=1719的会话持有0x7000113069f8ea8锁队列,以及该sid的相关信息
user session for deadlock lock 0x7000113069f9078
  sid: 8470 ser: 23045 audsid: 528999917 user: 45/XXXXX
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 237 O/S info: user: grid, term: UNKNOWN, ospid: 56427232
    image: oracle@xxxxxdb
    
--可以查到该会话的来源  
  client details:
    O/S info: user: xxxxxx, term: unknown, ospid: 1234
    machine: xxxxxx program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
    
--该会话当前执行的sql
  current SQL:
  select zhanghao, zhhuzwmc, kehuhaoo, guobdaim, huobdaih, chaohubz, cunqiiii, doqiriqi, qixifans, csqixirq, csdoqirq, yewudhao, pcljigoh, zhujigoh, kaihjigo, kaihriqi, kaihguiy, xiohjigo, xiohriqi, xiohguiy, lancreny, lancrymc, youxriqi, weiyxuho, zhhuyuee, shrizhye, yegxriqi, sccrriqi, scywriqi, scsfriqi, chapbhao, fzcpleix, suoshudx, zhufldm1, zhufldm2, zhufldm3, huansbiz, zuidlcye, zuixlcye, cunrkzhi, cunrkzff, cunrclsx, zhiqkzfs, zhiqkzff, zdzqkzfs, kehuzhao, zhcunfsh, beiyjine, kaihjine, cunkzlei, zhhuztai, yezztbbz, zhcphaoo, zhcpxuho, zhcpmuzh, zhzhleix, zhhuxzbz, xzhileix, xunhdkbz, zhbhgxbz, zhiqbhsx, gltouzbz, xtaizybz, dghushux, jinkzhbz, yuxutzbz, waihjgbz, waihhcbz, jieszhbz, qyuelxbz, yxxjzqbz, yxzzzqbz, yxxjcrbz, yxzzcrbz, xiedckbz, shfojdjx, sfdylxjh, lxzffans, shcifxri, xi
DUMP LOCAL BLOCKER: initiate state dump for DEADLOCK

--说明pid为521,ospid为56427232的进程正在持有7000113353b8108的tx锁资源
  possible owner[521.56427232] on resource TX-0058001F-0008083A

*** 2022-06-09 13:44:32.125
Submitting asynchronized dump request [1c]. summary=[ges process stack dump (kjdglblkrdm1)].

--这里首先DUMP了本地BLOCKER,而[0xd90004][0x3154f],[TX][ext 0x2,0x0]是1节点的的事物的XID详细信息;
DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0xd90004][0x3154f],[TX][ext 0x2,0x0]
--这里主要DUMP除了详细的资源信息,其中resname后表示事务的XID信息,[0xd90004][0x3154f],[TX][ext 0x2,0x0]分别表示undo段的slot和seq;
----------resource 700011315948fe8----------------------
resname       : [0xd90004][0x3154f],[TX][ext 0x2,0x0]
hash mask     : x3
Local inst    : 1
dir_inst      : 1
master_inst   : 1
hv idx        : 50
hv last r.inc : 53
current inc   : 53
hv status     : 0
hv master     : 0
open options  : dd 
grant_bits    : KJUSERNL KJUSEREX 
grant mode    : KJUSERNL  KJUSERCR  KJUSERCW  KJUSERPR  KJUSERPW  KJUSEREX
count         : 11         0         0         0         0         1
val_state     : KJUSERVS_NOVALUE
valblk        : 0x0fffffffffffb250000000010a403850 .P@8P
access_inst   : 1
vbreq_state   : 0
state         : x0
resp          : 700011315948fe8
On Scan_q?    : N
Total accesses: 3455
Imm.  accesses: 2954
Granted_locks : 1 
Cvting_locks  : 11 
value_block:  0f ff ff ff ff ff b2 50 00 00 00 01 0a 40 38 50
--该队列持有链表,持有TX锁
GRANTED_Q :
lp 700011335cfc720 gl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011343667d30 possible pid 56427232 xid 100E-00ED-00006E81 bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > LOC_AST
  open opt KJUSERDEADLOCK  
--转换列表,有10个队列等待持有该tx锁
CONVERT_Q: 
lp 700011335cfc8f0 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011352af1648 possible pid 52822572 xid 1020-0209-00048163 bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 700011305d59088 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011322c6d6e0 possible pid 56033838 xid 1004-004E-00000D72 bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 70001130584b520 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011313bc9a98 possible pid 51053142 xid 1007-007E-00000889 bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 700011346b3db08 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 7000113437039d0 possible pid 58131104 xid 1006-006F-00000D2F bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 700011354eac6e0 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 7000113227f6fa0 possible pid 51970724 xid 100C-00C0-00000171 bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 7000113367160c0 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011323705fe0 possible pid 48431642 xid 100E-00EF-000000FF bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 700011355132f18 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011342adae30 possible pid 31326930 xid 1008-0089-000009AF bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 7000113d712e5a8 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011302b722a0 possible pid 48759390 xid 100C-00CB-00000ED4 bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 700011325fcb810 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011342847130 possible pid 34996820 xid 101C-01C1-0004B691 bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 7000113d6ea7d70 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 7000113d29fbdf0 possible pid 45482500 xid 100C-00C4-000001C6 bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
lp 700011306772dc8 gl KJUSERNL rl KJUSEREX rp 700011315948fe8 [0xd90004][0x3154f],[TX][ext 0x2,0x0]
  master 1 gl owner 700011303373dc0 possible pid 39781094 xid 1012-0124-000003CE bast 0 rseq 131 mseq 0
  history REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE  
  

----这里开始dump转换列表中700011335cfc720锁的详细信息
----------enqueue 700011335cfc720------------------------
lock version     : 1362841
Owner inst       : 1
grant_level      : KJUSEREX
req_level        : KJUSEREX
bast_level       : KJUSERNL
notify_func      : 0
resp             : 700011315948fe8
procp            : 7000113244c53a8
pid              : 52822572
proc version     : 9839
oprocp           : 0
opid             : 52822572
group lock owner : 700011343667d30
possible pid     : 56427232
xid              : 100E-00ED-00006E81
dd_time          : 0.0 secs
dd_count         : 0
timeout          : 0.0 secs
On_timer_q?      : N
On_dd_q?         : N
lock_state       : GRANTED
ast_flag         : 0x0
Open Options     : KJUSERDEADLOCK 
Convert options  : KJUSERNOQUEUE KJUSERNODEADLOCKWAIT 
History          : REF_RES > LOC_AST > CLOSE > FREE > REF_RES > LOC_AST
Msg_Seq          : 0x0
res_seq          : 131
valblk           : 0x0000000101e110c007000113069f9078 .x
--从这里看出,sid=8470的会话持有0x700011335cfc720锁队列,以及该sid的相关信息
user session for deadlock lock 0x700011335cfc720
  sid: 8470 ser: 23045 audsid: 528999917 user: 45/XXXXX
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 237 O/S info: user: grid, term: UNKNOWN, ospid: 56427232
    image: oracle@xxxxxdb
--可以查到该会话的来源  
 client details:
    O/S info: user: xxxxxx, term: unknown, ospid: 1234
    machine: xxxxx program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660

--该会话当前执行的sql
  current SQL:
  select zhanghao, zhhuzwmc, kehuhaoo, guobdaim, huobdaih, chaohubz, cunqiiii, doqiriqi, qixifans, csqixirq, csdoqirq, yewudhao, pcljigoh, zhujigoh, kaihjigo, kaihriqi, kaihguiy, xiohjigo, xiohriqi, xiohguiy, lancreny, lancrymc, youxriqi, weiyxuho, zhhuyuee, shrizhye, yegxriqi, sccrriqi, scywriqi, scsfriqi, chapbhao, fzcpleix, suoshudx, zhufldm1, zhufldm2, zhufldm3, huansbiz, zuidlcye, zuixlcye, cunrkzhi, cunrkzff, cunrclsx, zhiqkzfs, zhiqkzff, zdzqkzfs, kehuzhao, zhcunfsh, beiyjine, kaihjine, cunkzlei, zhhuztai, yezztbbz, zhcphaoo, zhcpxuho, zhcpmuzh, zhzhleix, zhhuxzbz, xzhileix, xunhdkbz, zhbhgxbz, zhiqbhsx, gltouzbz, xtaizybz, dghushux, jinkzhbz, yuxutzbz, waihjgbz, waihhcbz, jieszhbz, qyuelxbz, yxxjzqbz, yxzzzqbz, yxxjcrbz, yxzzcrbz, xiedckbz, shfojdjx, sfdylxjh, lxzffans, shcifxri, xi
DUMP LOCAL BLOCKER: initiate state dump for DEADLOCK
--说明pid为237,ospid为56427232的进程正在持有700011315948fe8的tx锁资源
  possible owner[237.56427232] on resource TX-00D90004-0003154F

*** 2022-06-09 13:44:32.127
Submitting asynchronized dump request [1c]. summary=[ges process stack dump (kjdglblkrdm1)].

--这里开始dump锁700011335cfc8f0的详细信息
----------enqueue 700011335cfc8f0------------------------
lock version     : 1248495
Owner inst       : 1
grant_level      : KJUSERNL
req_level        : KJUSEREX
bast_level       : KJUSERNL
notify_func      : 0
resp             : 700011315948fe8
procp            : 7000113244c53a8
pid              : 52822572
proc version     : 9839
oprocp           : 0
opid             : 52822572
group lock owner : 700011352af1648
possible pid     : 52822572
xid              : 1020-0209-00048163
dd_time          : 0.0 secs
dd_count         : 0
timeout          : 0.0 secs
On_timer_q?      : N
On_dd_q?         : Y
lock_state       : OPENING CONVERTING 
ast_flag         : 0x0
Open Options     : KJUSERDEADLOCK 
Convert options  : KJUSERGETVALUE 
History          : REF_RES > LOC_AST > CLOSE > FREE > REF_RES > GR2CVT
Msg_Seq          : 0x0
res_seq          : 131
valblk           : 0x09000000000063480700011335cfc720 .cH5 

--该所的持有会话信息
user session for deadlock lock 0x700011335cfc8f0
  sid: 1719 ser: 24115 audsid: 529000011 user: 45/XXXXX
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 521 O/S info: user: grid, term: UNKNOWN, ospid: 52822572
    image: oracle@xxxxxdb
--锁的客户端信息
  client details:
    O/S info: user: xxxxx, term: unknown, ospid: 1234
    machine: xxxxx program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
--当前执行的sql
  current SQL:
  select zhanghao, zhhuywzl, zhanghlx, zhanghmc, yewubima, zhnghsbm, zhanghxh, hesuandm, zhyngyjg, huobdaih, shqizhye, shqiyefx, yuexingz, zhanghye, yuefangx, neibzhlx, yegenxms, yejcbioz, dfsgjzxk, jfsgjzxk, kaihuriq, kaihuguy, kaihgyls, zuihjyrq, hexiaorq, hexiaogy, hexiaols, zhanghzt, farendma, weihguiy, weihjigo, weihriqi, weihshij, shijchuo, jiluztai from ktaa_nbfhzh where zhanghao=:1  and farendma=:2  for update
DUMP LOCAL BLOCKER: initiate state dump for DEADLOCK
--会话521,pid为52822572占用了700011315948fe锁
  possible owner[521.52822572] on resource TX-00D90004-0003154F

*** 2022-06-09 13:44:32.127
Submitting asynchronized dump request [1c]. summary=[ges process stack dump (kjdglblkrdm1)].
Global blockers dump end:-----------------------------------
--这里dump除了所有的锁链关系,即对上面的死锁做了总结图
Global Wait-For-Graph(WFG) at ddTS[0.2760] : 
BLOCKED 0x7000113069f9078 5 wq 2 cvtops x1 TX 0x58001f.0x8083a(ext 0x2,0x0)[100E-00ED-00006E81] inst 1     --4780/blocked
BLOCKER 0x7000113069f8ea8 5 wq 1 cvtops x28 TX 0x58001f.0x8083a(ext 0x2,0x0)[1020-0209-00048163] inst 1     --1719/blocker
BLOCKED 0x700011335cfc8f0 5 wq 2 cvtops x1 TX 0xd90004.0x3154f(ext 0x2,0x0)[1020-0209-00048163] inst 1     --1719/blocked
BLOCKER 0x700011335cfc720 5 wq 1 cvtops x28 TX 0xd90004.0x3154f(ext 0x2,0x0)[100E-00ED-00006E81] inst 1    --4780/blocker

*** 2022-06-09 13:44:33.096
* Cancel deadlock victim lockp 0x7000113069f9078 
kjddt2vb: valblk  [0.2763] > local ts [0.2762]

​ 从上述内容中可以看出,此时数据库中4780,占用了xid为0xd90004.0x3154f的锁资源,需要获取xid为0x58001f.0x8083a的锁资源,而1719占用了xid为0x58001f.0x8083a锁资源,需要获取xid为0xd90004.0x3154f的锁资源,因此形成了死锁。

死锁形成的原因众多,可以参考mos文档(1443482.1)

​ 注:

Granted_Q section prints the details about locks currently held on the resource.
Granted_Q部分打印当前在资源上持有的锁的详细信息。

Convert_Q section prints the details about process waiting in the converting queue. 
Convert_Q部分打印关于在转换队列中等待的进程的详细信息。

Notice that there is no PID or XID information printed in this lock structure, as that information is not available in the local node. You will need to refer to the remote node LMD trace file to identify that information.
注意,这个锁结构中没有打印PID或XID信息,因为这些信息在本地节点中不可用。将需要引用远程节点LMD跟踪文件来识别该信息。

三 结论

​ 通过分析以及mos文档比对,可以确认,在13:44:27的时候,会话1719和会话4780相互请求0x58001f.0x8083a锁资源以及0xd90004.0x3154f锁资源从而导致死锁,紧接着4780会话又对后续的几个会话进行锁定,造成小范围的行锁。

​ 而根据lmd0的trc文件分析,以及mos文档中的对比,可以发现。该死锁是典型的事物型死锁,该死锁通常是应用层面的提交顺序以及高并发造成,想要解决该死锁的方式,主要方法就是优化应用代码。

image.png

​ 此时,我们则可以通过造成死锁的会话找出造成死锁的sql,并提交至开发部门进行检查和优化。

猜你喜欢

转载自blog.csdn.net/wx370092877/article/details/125395047