邱明芬 發表於 2025-6-24 10:28:00

DolphinScheduler 调度变慢?试试这些数据库性能优化策略

<h2 id="问题背景">问题背景</h2>
<p>DolphinScheduler 1.3.9版本</p>
<p>查询任务实例列表等接口时,有时会出现响应慢的情况,超过20秒才返回结果。</p>
<h2 id="问题诊断">问题诊断</h2>
<h3 id="1mysql开启慢日志">(1)mysql开启慢日志</h3>
<p>/etc/mysql.cnf添加如下配置参数</p>
<pre><code>slow_query_log = ON
slow_query_log_file = /data/log/mysql/mysql-slow.log
long_query_time = 2
</code></pre>
<h3 id="2慢日志分析">(2)慢日志分析</h3>
<p>日志中以下几类sql超时频率较高:</p>
<pre><code>select * from t_ds_process_instance
where 1=1 and state in ( 0 , 1 , 2 , 4 ) and process_definition_id=71
and (schedule_time &gt;= '2022-07-20 00:00:00' and schedule_time &lt;= '2022-07-20 23:59:59.999'
or start_time &gt;= '2022-07-20 00:00:00' and start_time &lt;= '2022-07-20 23:59:59.999')
order by start_time desc limit 1;
</code></pre>
<pre><code>select instance.*,process.name as process_instance_name from t_ds_task_instance instance
left join t_ds_process_definition define on instance.process_definition_id = define.id
left join t_ds_process_instance process on process.id=instance.process_instance_id
where define.project_id = 6 order by instance.start_time desc LIMIT 0,10;
</code></pre>
<p>主要涉及<code>t_ds_process_instance</code>和<code>t_ds_task_instance</code>两张表。</p>
<p>通过<code>explain analyze</code>命令分析,可以发现第一类sql使用<code>start_time_index</code>这个索引查询的过程中耗时较多,<code>start_time_index</code>区分度较差,需要有更好区分度的索引来提升查询速度;</p>
<pre><code>-&gt; Limit: 1 row(s) (cost=0.20 rows=0) (actual time=103.391..103.391 rows=1 loops=1)
    -&gt; Filter: ((t_ds_process_instance.process_definition_id = 18) and (((t_ds_process_instance.schedule_time &gt;= TIMESTAMP'2022-07-18 00:00:00') and (t_ds_process_instance.schedule_time &lt;= TIMESTAMP'2022-07-18 23:59:59.999')) or ((t_ds_process_instance.start_time &gt;= TIMESTAMP'2022-07-18 00:00:00') and (t_ds_process_instance.start_time &lt;= TIMESTAMP'2022-07-18 23:59:59.999')))) (cost=0.20 rows=0) (actual time=103.391..103.391 rows=1 loops=1)
      -&gt; Index scan on t_ds_process_instance using start_time_index (reverse) (cost=0.20 rows=2) (actual time=0.030..103.165 rows=1239 loops=1)
</code></pre>
<p>第二类sql主要耗时在left join阶段,需要更新索引字段</p>
<pre><code>-&gt; Limit: 10 row(s) (actual time=3601.141..3601.147 rows=10 loops=1)
    -&gt; Sort row IDs: `instance`.start_time DESC, limit input to 10 row(s) per chunk (actual time=3601.140..3601.145 rows=10 loops=1)
      -&gt; Table scan on &lt;temporary&gt; (cost=0.01..3774.21 rows=301738) (actual time=0.002..190.179 rows=722743 loops=1)
            -&gt; Temporary table (cost=469419.96..473194.16 rows=301738) (actual time=3332.896..3551.716 rows=722743 loops=1)
                -&gt; Nested loop left join (cost=439246.15 rows=301738) (actual time=0.051..1431.254 rows=722743 loops=1)
                  -&gt; Nested loop inner join (cost=107334.40 rows=301738) (actual time=0.045..1239.699 rows=722743 loops=1)
                        -&gt; Filter: (define.id is not null) (cost=12.40 rows=121) (actual time=0.025..0.162 rows=121 loops=1)
                            -&gt; Index lookup on define using process_definition_index (project_id=6) (cost=12.40 rows=121) (actual time=0.025..0.110 rows=121 loops=1)
                        -&gt; Index lookup on instance using task_instance_index (process_definition_id=define.id) (cost=639.65 rows=2494) (actual time=0.006..9.972 rows=5973 loops=121)
                  -&gt; Single-row index lookup on process using PRIMARY (id=`instance`.process_instance_id) (cost=1.00 rows=1) (actual time=0.000..0.000 rows=1 loops=722743)
</code></pre>
<h3 id="3问题定位">(3)问题定位</h3>
<ul>
<li>mysql配置未优化<br>
当前使用的mysql配置均为默认配置,对查询影响较大的配置项如:</li>
</ul>
<p><code>innodb_buffer_pool_size</code>(InnoDB使用一个缓冲池来保存索引和原始数据)。</p>
<p><code>innodb_thread_concurrency</code>(设置为 0,表示不限制并发数,这里推荐设置为0,更好去发挥CPU多核处理能力,提高并发量)</p>
<p><code>innodb_write_io_threads &amp; innodb_read_io_threads</code> (innodb使用后台线程处理数据页上的读写 I/O请求)<br>
需要调整以上参数,提高mysql性能。</p>
<ul>
<li>
<p>DolphinScheduler索引未优化<br>
当前慢查询报出的SQL的索引需要优化。</p>
</li>
<li>
<p>DolphinScheduler单表数据较大<br>
当前慢查询涉及的SQL表(t_ds_task_instance)较大,数据大小超过4G,数据条数超过200w,需要切分归档。</p>
</li>
</ul>
<h2 id="优化内容">优化内容</h2>
<h3 id="1大表归档">(1)大表归档</h3>
<ul>
<li>创建备份表</li>
</ul>
<p>根据<code>t_ds_process_instance DDL</code>建表语句复制表<code>t_ds_process_instance_bak</code>。</p>
<p>根据<code>t_ds_task_instance DDL</code>建表语句复制表 <code>t_ds_task_instance_bak</code>。</p>
<p>注:外键行删除</p>
<p>CONSTRAINT <code>foreign_key_instance_id</code> FOREIGN KEY (<code>process_instance_id</code>) REFERENCES <code>t_ds_process_instance</code> (<code>id</code>) ON DELETE CASCADE ON UPDATE RESTRICT</p>
<ul>
<li>将归档日期以前的数据从两张表中挪至备份表,如2022-01-01 00:00:00</li>
</ul>
<pre><code>INSERT into t_ds_process_instance_copy select * from t_ds_process_instance WHERE start_time &lt; '2022-01-01 00:00:00';
</code></pre>
<pre><code>INSERT into t_ds_task_instance_copy select * from t_ds_task_instance WHERE start_time &lt; '2022-01-01 00:00:00';
</code></pre>
<ul>
<li>比对是否备份成功</li>
</ul>
<pre><code>select count(*) from t_ds_process_instance_copy;

select count(*) from t_ds_process_instance where start_time &lt; '2022-01-01 00:00:00';

select count(*) from t_ds_task_instance_copy;

select count(*) from t_ds_task_instance where start_time &lt; '2022-01-01 00:00:00';
</code></pre>
<ul>
<li>删除历史数据</li>
</ul>
<pre><code>DELETE FROM t_ds_process_instance WHERE start_time &lt; '2022-01-01 00:00:00';

DELETE FROM t_ds_task_instance WHERE start_time &lt; '2022-01-01 00:00:00';
</code></pre>
<h3 id="2参数优化">(2)参数优化</h3>
<p><code>/etc/my.cnf mysql</code>配置文件中调整如下参数并重启mysql</p>
<pre><code>innodb_log_buffer_size= 64M

innodb_buffer_pool_size= 20G //(根据服务器内存适当调整)如果数据库独占一台机器则设置物理内存的70%

innodb_log_file_size= 1G

innodb_thread_concurrency = 0

join_buffer_size = 64M

sort_buffer_size = 64M

innodb_read_io_threads = 16 //(根据服务器CPU适当调整)

innodb_write_io_threads = 16 //(根据服务器CPU适当调整)
</code></pre>
<h3 id="3索引优化">(3)索引优化</h3>
<pre><code>create index state_index on t_ds_process_instance(state, process_definition_id);

create index start_time_process_definition_index on t_ds_task_instance(start_time, process_definition_id);

alter table t_ds_task_instance drop index task_instance_index;
</code></pre>
<p>转载自daozi126<br>
原文链接:https://blog.csdn.net/u012938208/article/details/147899920</p>
<blockquote>
<p>本文由 白鲸开源 提供发布支持!</p>
</blockquote><br><br>
来源:https://www.cnblogs.com/DolphinScheduler/p/18945772
頁: [1]
查看完整版本: DolphinScheduler 调度变慢?试试这些数据库性能优化策略