详解通过SQL进行分布式死锁的检测与消除
<div id="navCategory"><h5 class="catalogue">目录</h5><ul class="first_class_ul"><li>概述</li><li>
分布式死锁和单节点死锁的比较单节点死锁<ul class="second_class_ul"><li>
分布式死锁</li></ul></li><li>
分布式死锁的检测与消除<ul class="second_class_ul"><li>
收集各节点的锁信息</li><li>
构建等待关系</li><li>
等待关系判环</li><li>
消除死锁</li></ul></li><li>
总结<ul class="second_class_ul"></ul></li></ul></div><div id="navcategory">
<h5 class="catalogue">
目录</h5>
<ul class="first_class_ul">
<li>
概述</li>
<li>
分布式死锁和单节点死锁的比较单节点死锁
<ul class="second_class_ul">
<li>
分布式死锁</li>
</ul>
</li>
<li>
分布式死锁的检测与消除
<ul class="second_class_ul">
<li>
收集各节点的锁信息</li>
<li>
构建等待关系</li>
<li>
等待关系判环</li>
<li>
消除死锁</li>
</ul>
</li>
<li>
总结
<ul class="second_class_ul"></ul>
</li>
</ul>
</div>
<p class="maodian">
</p>
<p class="maodian"></p><h2>
概述</h2>
<p>
分布式数仓应用场景中,我们经常遇到数据库系统 hang 住的问题,所谓 hang 是指虽然数据库系统还在运行,但部分或全部业务无法正常执行。hang 问题的原因有很多,其中以分布式死锁最为常见,本次主要分享在碰到死锁时,如何快速地解决死锁问题。</p>
<p>
gaussdb(dws) 作为分布式数仓,通过锁机制来实行并发控制,因此也存在产生分布式死锁的可能。虽然分布式死锁无法避免,但幸运的是其提供了多种系统视图,能够保证在分布式死锁发生之后,快速地对死锁进行定位。</p>
<p>
本文主要介绍了在 gaussdb(dws) 中,如何通过 sql 语句,对分布式死锁进行检测和恢复。本文介绍的方法大致分为 4 步:</p>
<p>
1. 收集各节点的锁信息。</p>
<p>
2. 构建等待关系。</p>
<p>
3. 检测循环等待。</p>
<p>
4. 中止事务以消除死锁。</p>
<p>
本文介绍的方法使用简单,门槛低,可以确保在分布式死锁发生之后,快速解决问题,恢复业务。</p>
<p class="maodian">
</p>
<p class="maodian"></p><h2>
分布式死锁和单节点死锁的比较单节点死锁</h2>
<p>
单节点死锁是指,死锁中的所有锁等待信息来自同一个节点,例如:</p>
<div class="jb51code">
<div>
<div class="syntaxhighlightersql" id="highlighter_422453">
<div class="toolbar">
<span>?</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tbody><tr>
<td class="gutter">
<div class="line number1 index0 alt2">
1</div>
<div class="line number2 index1 alt1">
2</div>
<div class="line number3 index2 alt2">
3</div>
<div class="line number4 index3 alt1">
4</div>
<div class="line number5 index4 alt2">
5</div>
<div class="line number6 index5 alt1">
6</div>
<div class="line number7 index6 alt2">
7</div>
<div class="line number8 index7 alt1">
8</div>
<div class="line number9 index8 alt2">
9</div>
<div class="line number10 index9 alt1">
10</div>
<div class="line number11 index10 alt2">
11</div>
<div class="line number12 index11 alt1">
12</div>
<div class="line number13 index12 alt2">
13</div>
<div class="line number14 index13 alt1">
14</div>
<div class="line number15 index14 alt2">
15</div>
<div class="line number16 index15 alt1">
16</div>
<div class="line number17 index16 alt2">
17</div>
<div class="line number18 index17 alt1">
18</div>
<div class="line number19 index18 alt2">
19</div>
<div class="line number20 index19 alt1">
20</div>
</td>
<td class="code">
<div class="container">
<div class="line number1 index0 alt2">
<code class="sql comments">-- 事务 transaction1</code>
</div>
<div class="line number2 index1 alt1">
<code class="sql comments">-- 所在节点:cn1</code>
</div>
<div class="line number3 index2 alt2">
</div>
<div class="line number4 index3 alt1">
<code class="sql keyword">begin</code><code class="sql plain">;</code>
</div>
<div class="line number5 index4 alt2">
</div>
<div class="line number6 index5 alt1">
<code class="sql keyword">truncate</code> <code class="sql plain">t1;</code>
</div>
<div class="line number7 index6 alt2">
<code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn1) </code><code class="sql string">'select * from t2'</code><code class="sql plain">;</code>
</div>
<div class="line number8 index7 alt1">
</div>
<div class="line number9 index8 alt2">
<code class="sql keyword">commit</code><code class="sql plain">;</code>
</div>
<div class="line number10 index9 alt1">
</div>
<div class="line number11 index10 alt2">
</div>
<div class="line number12 index11 alt1">
<code class="sql comments">-- 事务 transaction2</code>
</div>
<div class="line number13 index12 alt2">
<code class="sql comments">-- 所在节点:cn1</code>
</div>
<div class="line number14 index13 alt1">
</div>
<div class="line number15 index14 alt2">
<code class="sql keyword">begin</code><code class="sql plain">;</code>
</div>
<div class="line number16 index15 alt1">
</div>
<div class="line number17 index16 alt2">
<code class="sql keyword">truncate</code> <code class="sql plain">t2;</code>
</div>
<div class="line number18 index17 alt1">
<code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn2) </code><code class="sql string">'select * from t1'</code><code class="sql plain">;</code>
</div>
<div class="line number19 index18 alt2">
</div>
<div class="line number20 index19 alt1">
<code class="sql keyword">commit</code><code class="sql plain">;</code>
</div>
</div>
</td>
</tr></tbody></table>
</div>
</div>
</div>
<p>
假设上述两个事务的执行顺序如下:</p>
<p>
1. truncate t1</p>
<p>
2. truncate t2</p>
<p>
3. execute direct on(dn1) 'select * from t2'</p>
<p>
4. execute direct on(dn2) 'select * from t1'</p>
<p>
该执行顺序会导致死锁的产生。由于事务 transaction1 和 transaction2 都在 cn1 上执行,死锁中的所有锁等待信息都在 cn1 上,因此该死锁为单节点死锁。</p>
<p>
<img title="详解通过SQL进行分布式死锁的检测与消除" alt="详解通过SQL进行分布式死锁的检测与消除" src="https://zhuji.jb51.net/uploads/img/202305/15fac69786ce50736ce264ac782a2392.jpg"></p>
<p>
gaussdb(dws) 支持自动处理单节点死锁。当某个节点上的多个事务陷入循环等待时,数据库系统会自动将其中一个事务中止,从而消除死锁。</p>
<p>
<img title="详解通过SQL进行分布式死锁的检测与消除" alt="详解通过SQL进行分布式死锁的检测与消除" src="https://zhuji.jb51.net/uploads/img/202305/acfb4a23050fc54394a4678d95a44abf.jpg"></p>
<p class="maodian">
</p>
<p class="maodian"></p><h3>
分布式死锁</h3>
<p>
分布式死锁是指,死锁中的锁等待信息来自不同节点。例如:</p>
<div class="jb51code">
<div>
<div class="syntaxhighlightersql" id="highlighter_353751">
<div class="toolbar">
<span>?</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tbody><tr>
<td class="gutter">
<div class="line number1 index0 alt2">
1</div>
<div class="line number2 index1 alt1">
2</div>
<div class="line number3 index2 alt2">
3</div>
<div class="line number4 index3 alt1">
4</div>
<div class="line number5 index4 alt2">
5</div>
<div class="line number6 index5 alt1">
6</div>
<div class="line number7 index6 alt2">
7</div>
<div class="line number8 index7 alt1">
8</div>
<div class="line number9 index8 alt2">
9</div>
<div class="line number10 index9 alt1">
10</div>
<div class="line number11 index10 alt2">
11</div>
<div class="line number12 index11 alt1">
12</div>
<div class="line number13 index12 alt2">
13</div>
<div class="line number14 index13 alt1">
14</div>
<div class="line number15 index14 alt2">
15</div>
<div class="line number16 index15 alt1">
16</div>
<div class="line number17 index16 alt2">
17</div>
<div class="line number18 index17 alt1">
18</div>
<div class="line number19 index18 alt2">
19</div>
<div class="line number20 index19 alt1">
20</div>
</td>
<td class="code">
<div class="container">
<div class="line number1 index0 alt2">
<code class="sql comments">-- 事务 transaction1</code>
</div>
<div class="line number2 index1 alt1">
<code class="sql comments">-- 所在节点:cn1</code>
</div>
<div class="line number3 index2 alt2">
</div>
<div class="line number4 index3 alt1">
<code class="sql keyword">begin</code><code class="sql plain">;</code>
</div>
<div class="line number5 index4 alt2">
</div>
<div class="line number6 index5 alt1">
<code class="sql keyword">truncate</code> <code class="sql plain">t1;</code>
</div>
<div class="line number7 index6 alt2">
<code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn1) </code><code class="sql string">'select * from t2'</code><code class="sql plain">;</code>
</div>
<div class="line number8 index7 alt1">
</div>
<div class="line number9 index8 alt2">
<code class="sql keyword">commit</code><code class="sql plain">;</code>
</div>
<div class="line number10 index9 alt1">
</div>
<div class="line number11 index10 alt2">
</div>
<div class="line number12 index11 alt1">
<code class="sql comments">-- 事务 transaction2</code>
</div>
<div class="line number13 index12 alt2">
<code class="sql comments">-- 所在节点:cn2</code>
</div>
<div class="line number14 index13 alt1">
</div>
<div class="line number15 index14 alt2">
<code class="sql keyword">begin</code><code class="sql plain">;</code>
</div>
<div class="line number16 index15 alt1">
</div>
<div class="line number17 index16 alt2">
<code class="sql keyword">truncate</code> <code class="sql plain">t2;</code>
</div>
<div class="line number18 index17 alt1">
<code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn2) </code><code class="sql string">'select * from t1'</code><code class="sql plain">;</code>
</div>
<div class="line number19 index18 alt2">
</div>
<div class="line number20 index19 alt1">
<code class="sql keyword">commit</code><code class="sql plain">;</code>
</div>
</div>
</td>
</tr></tbody></table>
</div>
</div>
</div>
<p>
本例与上一节中的例子相比,只有事务 transaction2 的所在节点从 cn1 改为了 cn2。</p>
<p>
假设两个事务的执行顺序和上一节中的执行顺序一致,还是会产生死锁,死锁中的锁等待信息如下:</p>
<p>
<img title="详解通过SQL进行分布式死锁的检测与消除" alt="详解通过SQL进行分布式死锁的检测与消除" src="https://zhuji.jb51.net/uploads/img/202305/00f8e76f6c586f4a4a8346d998eaae56.jpg"></p>
<p>
这就是一个典型的分布式死锁,单独看 cn1 或 cn2 上的锁等待信息,都看不出来有死锁,但将多个节点的锁等待信息放到一起看,就能找到有循环等待的现象。</p>
<p>
发生分布式死锁时,陷入死锁的事务全部都无法继续执行下去,只有其中一个事务锁等待超时,剩余事务才能继续执行。默认情况下,锁等待超时时间是 20 分钟。</p>
<p>
<img title="详解通过SQL进行分布式死锁的检测与消除" alt="详解通过SQL进行分布式死锁的检测与消除" src="https://zhuji.jb51.net/uploads/img/202305/017703e28548972abea0bbf4a83ab721.jpg"></p>
<p class="maodian">
</p>
<p class="maodian"></p><h2>
分布式死锁的检测与消除</h2>
<p>
当我们观察到数据库系统出现 hang 问题时,我们需要通过 sql 语句检测分布式死锁,如果发现确实存在分布式死锁,还需要对死锁进行消除。接下来以之前的分布式死锁为例,介绍分布式死锁的检测和消除的方法。</p>
<p class="maodian">
</p>
<p class="maodian"></p><h3>
收集各节点的锁信息</h3>
<p>
为了检测分布式死锁,首先需要获得各节点的锁信息。gaussdb(dws) 中可以通过 pg_locks 视图查询当前节点的锁信息,因此可以通过 execute direct 语句在所有节点查询 pg_locks 视图,并收集到当前节点中。</p>
<p>
注意此处有一个细节,pg_locks 视图中,很多信息是以 oid 类型给出的,例如一个锁加在一个表上,pg_locks 视图会给出表的 oid。由于同一个表在各节点中的 oid 不一定相同,因此不能通过 oid 来标识一个表。在收集锁信息时,需要先将表的 oid 转换成 schema 名加表名。其它 oid 信息例如分区 oid 等也同理,需要转化为对应的名字。</p>
<p>
执行附件中的示例代码 pgxc_locks.sql,就可以收集到各节点的锁信息:</p>
<div class="jb51code">
<div>
<div class="syntaxhighlightersql" id="highlighter_367259">
<div class="toolbar">
<span>?</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tbody><tr>
<td class="gutter">
<div class="line number1 index0 alt2">
1</div>
<div class="line number2 index1 alt1">
2</div>
<div class="line number3 index2 alt2">
3</div>
<div class="line number4 index3 alt1">
4</div>
<div class="line number5 index4 alt2">
5</div>
<div class="line number6 index5 alt1">
6</div>
<div class="line number7 index6 alt2">
7</div>
<div class="line number8 index7 alt1">
8</div>
<div class="line number9 index8 alt2">
9</div>
<div class="line number10 index9 alt1">
10</div>
<div class="line number11 index10 alt2">
11</div>
<div class="line number12 index11 alt1">
12</div>
<div class="line number13 index12 alt2">
13</div>
<div class="line number14 index13 alt1">
14</div>
<div class="line number15 index14 alt2">
15</div>
<div class="line number16 index15 alt1">
16</div>
<div class="line number17 index16 alt2">
17</div>
<div class="line number18 index17 alt1">
18</div>
<div class="line number19 index18 alt2">
19</div>
<div class="line number20 index19 alt1">
20</div>
<div class="line number21 index20 alt2">
21</div>
<div class="line number22 index21 alt1">
22</div>
<div class="line number23 index22 alt2">
23</div>
</td>
<td class="code">
<div class="container">
<div class="line number1 index0 alt2">
<code class="sql plain">locktype | nodename | datname | usename | nspname | relname | partname | page | tuple | virtualxid | transactionid | virtualtransaction | mode | granted | client_addr | application_name | pid | xact_start | query_start | state | query_id | query</code>
</div>
<div class="line number2 index1 alt1">
<code class="sql comments">---------------+--------------+----------+---------+---------+---------+----------+------+-------+------------+---------------+--------------------+---------------------+---------+-------------+------------------+-----------------+----------------------------+----------------------------+---------------------+-------------------+-----------------------------------------------------</code>
</div>
<div class="line number3 index2 alt2">
<code class="sql spaces"> </code><code class="sql plain">virtualxid | cn_5002 | postgres | tyx_1 | | | | | | 12/94 | | 12/94 | exclusivelock | t | | gsql | 140110481323776 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:19:37.715447 | active | 0 | </code><code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn_6003_6004) </code><code class="sql string">'select * from t1'</code><code class="sql plain">;</code>
</div>
<div class="line number4 index3 alt1">
<code class="sql spaces"> </code><code class="sql plain">virtualxid | cn_5002 | postgres | tyx_1 | | | | | | 9/298 | | 9/298 | exclusivelock | t | ::1/128 | cn_5001 | 140110672164608 | 2020-12-25 17:18:40.478704 | 2020-12-25 17:18:40.479682 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t1;</code>
</div>
<div class="line number5 index4 alt2">
<code class="sql spaces"> </code><code class="sql plain">virtualxid | cn_5002 | postgres | tyx_1 | | | | | | 6/161 | | 6/161 | exclusivelock | t | | wlmarbiter | 140110762325760 | 2020-12-25 17:20:18.613815 | 2020-12-25 16:53:35.027585 | active | 0 | wlm arbiter sync info </code><code class="sql keyword">by</code> <code class="sql plain">ccn </code><code class="sql color1">and</code> <code class="sql plain">cns</code>
</div>
<div class="line number6 index5 alt1">
<code class="sql spaces"> </code><code class="sql plain">virtualxid | cn_5002 | postgres | tyx_1 | | | | | | 5/162 | | 5/162 | exclusivelock | t | | workloadmonitor | 140110779119360 | 2020-12-25 17:20:27.16458 | 2020-12-25 16:53:35.027217 | active | 0 | wlm monitor </code><code class="sql keyword">update</code> <code class="sql color1">and</code> <code class="sql plain">verify </code><code class="sql keyword">local</code> <code class="sql plain">info</code>
</div>
<div class="line number7 index6 alt2">
<code class="sql spaces"> </code><code class="sql plain">virtualxid | cn_5002 | postgres | tyx_1 | | | | | | 3/325 | | 3/325 | exclusivelock | t | | workload | 140110846744320 | 2020-12-25 17:20:25.372654 | 2020-12-25 16:53:35.02741 | active | 72339069014641297 | wlm </code><code class="sql keyword">fetch</code> <code class="sql plain">collect info </code><code class="sql keyword">from</code> <code class="sql plain">data nodes</code>
</div>
<div class="line number8 index7 alt1">
<code class="sql spaces"> </code><code class="sql plain">advisory | cn_5002 | postgres | tyx_1 | | | | | | | | 12/94 | sharelock | t | | gsql | 140110481323776 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:19:37.715447 | active | 0 | </code><code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn_6003_6004) </code><code class="sql string">'select * from t1'</code><code class="sql plain">;</code>
</div>
<div class="line number9 index8 alt2">
<code class="sql spaces"> </code><code class="sql plain">relation | cn_5002 | postgres | tyx_1 | </code><code class="sql keyword">public</code> <code class="sql plain">| t1 | | | | | | 9/298 | accessexclusivelock | t | ::1/128 | cn_5001 | 140110672164608 | 2020-12-25 17:18:40.478704 | 2020-12-25 17:18:40.479682 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t1;</code>
</div>
<div class="line number10 index9 alt1">
<code class="sql spaces"> </code><code class="sql plain">relation | cn_5002 | postgres | tyx_1 | </code><code class="sql keyword">public</code> <code class="sql plain">| t1 | | | | | | 12/94 | accesssharelock | f | | gsql | 140110481323776 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:19:37.715447 | active | 0 | </code><code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn_6003_6004) </code><code class="sql string">'select * from t1'</code><code class="sql plain">;</code>
</div>
<div class="line number11 index10 alt2">
<code class="sql spaces"> </code><code class="sql plain">transactionid | cn_5002 | postgres | tyx_1 | | | | | | | 10269 | 12/94 | exclusivelock | t | | gsql | 140110481323776 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:19:37.715447 | active | 0 | </code><code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn_6003_6004) </code><code class="sql string">'select * from t1'</code><code class="sql plain">;</code>
</div>
<div class="line number12 index11 alt1">
<code class="sql spaces"> </code><code class="sql plain">transactionid | cn_5002 | postgres | tyx_1 | | | | | | | 10266 | 9/298 | exclusivelock | t | ::1/128 | cn_5001 | 140110672164608 | 2020-12-25 17:18:40.478704 | 2020-12-25 17:18:40.479682 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t1;</code>
</div>
<div class="line number13 index12 alt2">
<code class="sql spaces"> </code><code class="sql plain">relation | cn_5002 | postgres | tyx_1 | </code><code class="sql keyword">public</code> <code class="sql plain">| t2 | | | | | | 12/94 | accessexclusivelock | t | | gsql | 140110481323776 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:19:37.715447 | active | 0 | </code><code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn_6003_6004) </code><code class="sql string">'select * from t1'</code><code class="sql plain">;</code>
</div>
<div class="line number14 index13 alt1">
<code class="sql spaces"> </code><code class="sql plain">virtualxid | dn_6001_6002 | postgres | tyx_1 | | | | | | 17/433 | | 17/433 | exclusivelock | t | ::1/128 | cn_5001 | 140552375822080 | 2020-12-25 17:18:40.478704 | 2020-12-25 17:18:50.513948 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t1;</code>
</div>
<div class="line number15 index14 alt2">
<code class="sql spaces"> </code><code class="sql plain">virtualxid | dn_6001_6002 | postgres | tyx_1 | | | | | | 23/692 | | 23/692 | exclusivelock | t | ::1/128 | cn_5002 | 140552359040768 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:18:56.830053 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t2;</code>
</div>
<div class="line number16 index15 alt1">
<code class="sql spaces"> </code><code class="sql plain">virtualxid | dn_6001_6002 | postgres | tyx_1 | | | | | | 2/1607 | | 2/1607 | exclusivelock | t | | workload | 140552945264384 | | 2020-12-25 16:53:35.041283 | active | 0 | wlm </code><code class="sql keyword">fetch</code> <code class="sql plain">collect info </code><code class="sql keyword">from</code> <code class="sql plain">data nodes</code>
</div>
<div class="line number17 index16 alt2">
<code class="sql spaces"> </code><code class="sql plain">transactionid | dn_6001_6002 | postgres | tyx_1 | | | | | | | 10266 | 17/433 | exclusivelock | t | ::1/128 | cn_5001 | 140552375822080 | 2020-12-25 17:18:40.478704 | 2020-12-25 17:18:50.513948 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t1;</code>
</div>
<div class="line number18 index17 alt1">
<code class="sql spaces"> </code><code class="sql plain">relation | dn_6001_6002 | postgres | tyx_1 | | | | | | | | 23/692 | accessexclusivelock | t | ::1/128 | cn_5002 | 140552359040768 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:18:56.830053 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t2;</code>
</div>
<div class="line number19 index18 alt2">
<code class="sql spaces"> </code><code class="sql plain">relation | dn_6001_6002 | postgres | tyx_1 | | | | | | | | 17/433 | accessexclusivelock | t | ::1/128 | cn_5001 | 140552375822080 | 2020-12-25 17:18:40.478704 | 2020-12-25 17:18:50.513948 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t1;</code>
</div>
<div class="line number20 index19 alt1">
<code class="sql spaces"> </code><code class="sql plain">relation | dn_6001_6002 | postgres | tyx_1 | </code><code class="sql keyword">public</code> <code class="sql plain">| t2 | | | | | | 23/692 | sharelock | t | ::1/128 | cn_5002 | 140552359040768 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:18:56.830053 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t2;</code>
</div>
<div class="line number21 index20 alt2">
<code class="sql spaces"> </code><code class="sql plain">relation | dn_6001_6002 | postgres | tyx_1 | </code><code class="sql keyword">public</code> <code class="sql plain">| t2 | | | | | | 23/692 | accessexclusivelock | t | ::1/128 | cn_5002 | 140552359040768 | 2020-12-25 17:18:54.238933 | 2020-12-25 17:18:56.830053 | idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">| 0 | </code><code class="sql keyword">truncate</code> <code class="sql plain">t2;</code>
</div>
<div class="line number22 index21 alt1">
<code class="sql plain">省略若干行</code>
</div>
<div class="line number23 index22 alt2">
<code class="sql plain">(55 </code><code class="sql keyword">rows</code><code class="sql plain">)</code>
</div>
</div>
</td>
</tr></tbody></table>
</div>
</div>
</div>
<p class="maodian">
</p>
<p class="maodian"></p><h3>
构建等待关系</h3>
<p>
收集到各节点的锁信息之后,就可以开始构建等待关系了。</p>
<p>
事务 a 等待事务 b,需要满足 3 个条件:</p>
<p>
1. 两个事务加锁的资源相同(同一个表、同一个分区、同一个页面或同一个元组等)。特别注意,如果事务 a 对 dn1 的 t1 表的加锁,事务 b 对 dn2 的 t1 表的加锁,则我们认为它们加锁的资源不同,只有同一节点上的同一资源才被认为是相同的资源。</p>
<p>
2. 事务 b 已经持有锁,而事务 a 还未持有锁。</p>
<p>
3. 事务 a 和事务 b 申请的锁的级别互斥。</p>
<p>
通过对上一步收集到的锁信息进行处理,就可以构建出事务的等待关系。</p>
<p>
执行附件中的示例代码 pgxc_locks_wait.sql,就可以获得等待关系:</p>
<div class="jb51code">
<div>
<div class="syntaxhighlightersql" id="highlighter_485640">
<div class="toolbar">
<span>?</span>
</div>
<table border="0" cellpadding="0" cellspacing="0"><tbody><tr>
<td class="gutter">
<div class="line number1 index0 alt2">
1</div>
<div class="line number2 index1 alt1">
2</div>
<div class="line number3 index2 alt2">
3</div>
<div class="line number4 index3 alt1">
4</div>
<div class="line number5 index4 alt2">
5</div>
<div class="line number6 index5 alt1">
6</div>
<div class="line number7 index6 alt2">
7</div>
<div class="line number8 index7 alt1">
8</div>
<div class="line number9 index8 alt2">
9</div>
<div class="line number10 index9 alt1">
10</div>
<div class="line number11 index10 alt2">
11</div>
<div class="line number12 index11 alt1">
12</div>
<div class="line number13 index12 alt2">
13</div>
<div class="line number14 index13 alt1">
14</div>
<div class="line number15 index14 alt2">
15</div>
<div class="line number16 index15 alt1">
16</div>
<div class="line number17 index16 alt2">
17</div>
<div class="line number18 index17 alt1">
18</div>
<div class="line number19 index18 alt2">
19</div>
<div class="line number20 index19 alt1">
20</div>
<div class="line number21 index20 alt2">
21</div>
<div class="line number22 index21 alt1">
22</div>
<div class="line number23 index22 alt2">
23</div>
<div class="line number24 index23 alt1">
24</div>
<div class="line number25 index24 alt2">
25</div>
<div class="line number26 index25 alt1">
26</div>
<div class="line number27 index26 alt2">
27</div>
<div class="line number28 index27 alt1">
28</div>
<div class="line number29 index28 alt2">
29</div>
<div class="line number30 index29 alt1">
30</div>
<div class="line number31 index30 alt2">
31</div>
<div class="line number32 index31 alt1">
32</div>
<div class="line number33 index32 alt2">
33</div>
<div class="line number34 index33 alt1">
34</div>
<div class="line number35 index34 alt2">
35</div>
<div class="line number36 index35 alt1">
36</div>
<div class="line number37 index36 alt2">
37</div>
<div class="line number38 index37 alt1">
38</div>
<div class="line number39 index38 alt2">
39</div>
</td>
<td class="code">
<div class="container">
<div class="line number1 index0 alt2">
<code class="sql plain">locktype | nodename | datname | acquire_lock_pid | hold_lock_pid | acquire_lock_event | hold_lock_event</code>
</div>
<div class="line number2 index1 alt1">
<code class="sql comments">----------+----------+----------+------------------+-----------------+-------------------------------------------------------------------------+--------------------------------------------------------</code>
</div>
<div class="line number3 index2 alt2">
<code class="sql spaces"> </code><code class="sql plain">relation | cn_5001 | postgres | 140508814374656 | 140508792350464 | usename : tyx_1 +| usename : tyx_1 +</code>
</div>
<div class="line number4 index3 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | nspname : </code><code class="sql keyword">public</code> <code class="sql plain">+| nspname : </code><code class="sql keyword">public</code> <code class="sql plain">+</code>
</div>
<div class="line number5 index4 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | relname : t2 +| relname : t2 +</code>
</div>
<div class="line number6 index5 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | partname : +| partname : +</code>
</div>
<div class="line number7 index6 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | page : +| page : +</code>
</div>
<div class="line number8 index7 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | tuple : +| tuple : +</code>
</div>
<div class="line number9 index8 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | virtualxid : +| virtualxid : +</code>
</div>
<div class="line number10 index9 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | transactionid : +| transactionid : +</code>
</div>
<div class="line number11 index10 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | virtualtransaction: 11/13 +| virtualtransaction: 12/1323 +</code>
</div>
<div class="line number12 index11 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | mode : accesssharelock +| mode : accessexclusivelock +</code>
</div>
<div class="line number13 index12 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | client_addr : +| client_addr : ::1/128 +</code>
</div>
<div class="line number14 index13 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | application_name : gsql +| application_name : cn_5002 +</code>
</div>
<div class="line number15 index14 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | xact_start : 2020-12-25 17:18:40.478704 +| xact_start : 2020-12-25 17:18:54.238933 +</code>
</div>
<div class="line number16 index15 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | query_start : 2020-12-25 17:19:23.0923 +| query_start : 2020-12-25 17:18:54.239319 +</code>
</div>
<div class="line number17 index16 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | state : active +| state : idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">+</code>
</div>
<div class="line number18 index17 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | query_id : 0 +| query_id : 0 +</code>
</div>
<div class="line number19 index18 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | query : </code><code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn_6001_6002) </code><code class="sql string">'select * from t2'</code><code class="sql plain">;+| query : </code><code class="sql keyword">truncate</code> <code class="sql plain">t2; +</code>
</div>
<div class="line number20 index19 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | </code><code class="sql comments">------------------------------------------------------ | ------------------------------------------------------</code>
</div>
<div class="line number21 index20 alt2">
<code class="sql spaces"> </code><code class="sql plain">relation | cn_5002 | postgres | 140110481323776 | 140110672164608 | usename : tyx_1 +| usename : tyx_1 +</code>
</div>
<div class="line number22 index21 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | nspname : </code><code class="sql keyword">public</code> <code class="sql plain">+| nspname : </code><code class="sql keyword">public</code> <code class="sql plain">+</code>
</div>
<div class="line number23 index22 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | relname : t1 +| relname : t1 +</code>
</div>
<div class="line number24 index23 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | partname : +| partname : +</code>
</div>
<div class="line number25 index24 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | page : +| page : +</code>
</div>
<div class="line number26 index25 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | tuple : +| tuple : +</code>
</div>
<div class="line number27 index26 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | virtualxid : +| virtualxid : +</code>
</div>
<div class="line number28 index27 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | transactionid : +| transactionid : +</code>
</div>
<div class="line number29 index28 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | virtualtransaction: 12/94 +| virtualtransaction: 9/298 +</code>
</div>
<div class="line number30 index29 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | mode : accesssharelock +| mode : accessexclusivelock +</code>
</div>
<div class="line number31 index30 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | client_addr : +| client_addr : ::1/128 +</code>
</div>
<div class="line number32 index31 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | application_name : gsql +| application_name : cn_5001 +</code>
</div>
<div class="line number33 index32 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | xact_start : 2020-12-25 17:18:54.238933 +| xact_start : 2020-12-25 17:18:40.478704 +</code>
</div>
<div class="line number34 index33 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | query_start : 2020-12-25 17:19:37.715447 +| query_start : 2020-12-25 17:18:40.479682 +</code>
</div>
<div class="line number35 index34 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | state : active +| state : idle </code><code class="sql color1">in</code> <code class="sql keyword">transaction</code> <code class="sql plain">+</code>
</div>
<div class="line number36 index35 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | query_id : 0 +| query_id : 0 +</code>
</div>
<div class="line number37 index36 alt2">
<code class="sql spaces"> </code><code class="sql plain">| | | | | query : </code><code class="sql keyword">execute</code> <code class="sql plain">direct </code><code class="sql keyword">on</code><code class="sql plain">(dn_6003_6004) </code><code class="sql string">'select * from t1'</code><code class="sql plain">;+| query : </code><code class="sql keyword">truncate</code> <code class="sql plain">t1; +</code>
</div>
<div class="line number38 index37 alt1">
<code class="sql spaces"> </code><code class="sql plain">| | | | | </code><code class="sql comments">------------------------------------------------------ | ------------------------------------------------------</code>
</div>
<div class="line number39 index38 alt2">
<code class="sql plain">(2 </code><code class="sql keyword">rows</code><code class="sql plain">)</code>
</div>
</div>
</td>
</tr></tbody></table>
</div>
</div>
</div>
<p class="maodian">
</p>
<p class="maodian"></p><h3>
等待关系判环</h3>
<p>
构建出事务的等待关系之后,就可以通过检查等待关系是否成环,来判断当前是否有分布式死锁。</p>
<p>
一般情况下,等待关系不会太多,通过观察就可以判断出当前有无分布式死锁。通过观察上一节中构建的等待信息,可以很容易地判断出事务 transaction1 和 transaction2 发生了循环等待,即产生了死锁。</p>
<p class="maodian">
</p>
<p class="maodian"></p><h3>
消除死锁</h3>
<p>
上一步最终可能会找到等待关系中的一个或多个环,对于每个环,需要中止环中的一个事务,才能消除死锁。至于应该选择环中的哪个事务进行中止,需要我们从事务的重要性、已执行时间等多方面进行考虑,最终选择一个对业务影响最小的事务进行中止。</p>
<p class="maodian">
</p>
<p class="maodian"></p><h2>
总结</h2>
<p>
通过 sql 语句,我们可以很方便地处理分布式死锁。当我们在实际业务中遇到数据库系统 hang 住的问题时,可以借助本文提供的方法,检查 hang 问题是否是分布式死锁引起的,如果问题确实是由分布式死锁引起的,还可以通过中止某个陷入死锁的事务,来快速恢复业务。</p>
<p>
以上就是详解通过sql进行分布式死锁的检测与消除的详细内容,更多关于通过sql进行分布式死锁的检测与消除的资料请关注其它相关文章!</p>
<p>
原文链接:https://www.cnblogs.com/huaweiyun/p/14214901.html</p>
頁:
[1]