GreatSQL优化技巧：使用 FUNCTION 代替标量子查询

一装到底 發表於 2025-7-16 10:26:00

GreatSQL优化技巧：使用 FUNCTION 代替标量子查询

<h1 id="greatsql优化技巧使用-function-代替标量子查询">GreatSQL优化技巧：使用 FUNCTION 代替标量子查询</h1>
<h2 id="导语">导语</h2>
<p>本文案例涉及标量子查询，何为标量子查询呢？一般来说，介于 SELECT 与 FROM 之间的子查询就叫标量子查询，返回单行单列结果，可做为最外层 SELECT 结果集的一列。</p>
<p>举个例子：</p>
<pre><code class="language-SQL">SELECT e.employee_name,
(SELECT dept_name FROM hr_dept d WHERE d.dept_id=e.dept_id) AS deptname
FROM hr_employee e;
</code></pre>
<p><code>SELECT dept_name FROM hr_dept d WHERE d.dept_id=e.dept_id</code>这部分即为标量子查询。</p>
<p>标量子查询的特点，主查询返回多少行，子查询就会被执行多少次，这是天然的嵌套查询，标量子查询的执行效率对SQL整体的效率影响很大。</p>
<p>因此如果主查询返回的结果集大时，不推荐使用标量子查询，会考虑将标量子查询改为外连接，优化器就有更多的可选择空间，可以考虑使用HASH JOIN, 而不使用NEST LOOP。</p>
<p>但也有一些案例不适用做外连接改写，本文案例就是不适合做外连接改写的例子，感兴趣的可接着往下看。</p>
<h2 id="sql-案例">SQL 案例</h2>
<p>SQL案例语句：(实际业务场景太复杂，截取片段来说明本文主题)</p>
<pre><code class="language-SQL">SELECT (CASE
      WHEN aa.alert_type = '8' AND aa.alarm_geotype = '全市' THEN
      (concat('',
               (SELECT COUNT(1)
                  FROM t2 ca
               WHERE ca.accepttime >= concat(aa.alert_date, ' 00:00:00')
                  AND ca.accepttime <= concat(aa.alert_date, ' 23:59:59')
                  AND instr(ca.key_word, aa.key_word) > 0)))
      WHEN aa.alert_type = '8' AND aa.alarm_geotype = '区县' THEN
      (concat('',
               (SELECT COUNT(1)
                  FROM t2 ca
               WHERE ca.accepttime >= concat(aa.alert_date, ' 00:00:00')
                  AND ca.accepttime <= concat(aa.alert_date, ' 23:59:59')
                  AND instr(ca.key_word, aa.key_word) > 0
                  AND instr(ca.city_, aa.city_) > 0)))
   END) alert_value
FROM t1 aa
WHERE aa.ALERT_DATE >= '2025-05-15'
AND aa.ALERT_DATE <= '2025-06-15'
</code></pre>
<p>为想做实验验证的小伙伴提供了建表语句与测试数据</p>
<pre><code class="language-SQL">CREATE TABLE t1(alert_type VARCHAR(10),
alarm_geotype VARCHAR(20),
alert_date VARCHAR(10),
key_word VARCHAR(100),
city_ VARCHAR(100),
KEY idx_alertdate(alert_date)
);

CREATE TABLE t2(accepttime VARCHAR(50),
key_word VARCHAR(500),
city_ VARCHAR(100),
KEY idx_accepttime(accepttime)
);

DELIMITER //
CREATE OR REPLACE PROCEDURE p1() IS
BEGIN
FOR i IN 1..2000 LOOP
INSERT INTO t1 VALUES('8','全市',TO_CHAR(SYSDATE-RAND()*300,'yyyy-mm-dd'),'a',NULL);
END LOOP;
FOR i IN 1..2000 LOOP
INSERT INTO t1 VALUES('8','区县',TO_CHAR(SYSDATE-RAND()*300,'yyyy-mm-dd'),'a','b');
END LOOP;
FOR i IN 1..100000 LOOP
INSERT INTO t2 VALUES(TO_CHAR(SYSDATE-RAND()*31,'yyyy-mm-dd hh24:mi:ss'),'a','b');
END LOOP;
END;
//
DELIMITER ;

CALL P1;
</code></pre>
<h3 id="语句分析">语句分析</h3>
<p>主查询对t1表按 alert_date 查询一个月的数据，数据量为400条左右，SELECT部分含两个标量子查询，子查询的关联条件为多个非等值关联，查询项为聚合汇总项count。这样的情况下，不好改写成外连接，好在主查询返回的行数不多，如果子查询效率高的话，这个语句就没有性能问题。</p>
<h3 id="执行计划分析">执行计划分析</h3>
<p>本SQL执行计划如下：</p>
<pre><code class="language-SQL">*************************** 1. row ***************************
EXPLAIN: -> Index range scan on aa using idx_alertdate over ('2025-05-15' <= alert_date <= '2025-06-15'), with index condition: ((aa.alert_date >= '2025-05-15') and (aa.alert_date <= '2025-06-15'))(cost=178.01 rows=395) (actual time=0.045..19.089 rows=395 loops=1)
-> Select #2 (subquery in projection; dependent)
-> Aggregate: count(1)(cost=2310.76 rows=1) (actual time=206.479..206.480 rows=1 loops=188)
   -> Filter: ((ca.accepttime >= concat(aa.alert_date,' 00:00:00')) and (ca.accepttime <= concat(aa.alert_date,' 23:59:59')) and (instr(ca.key_word,aa.key_word) > 0))(cost=1199.51 rows=11113) (actual time=2.724..206.195 rows=3141 loops=188)
         -> Index range scan on ca (re-planned for each iteration)(cost=1199.51 rows=100033) (actual time=0.033..163.140 rows=100000 loops=188)
-> Select #3 (subquery in projection; dependent)
-> Aggregate: count(1)(cost=2310.76 rows=1) (actual time=228.676..228.677 rows=1 loops=207)
   -> Filter: ((ca.accepttime >= concat(aa.alert_date,' 00:00:00')) and (ca.accepttime <= concat(aa.alert_date,' 23:59:59')) and (instr(ca.key_word,aa.key_word) > 0) and (instr(ca.city_,aa.city_) > 0))(cost=1199.51 rows=11113) (actual time=0.078..228.418 rows=3088 loops=207)
         -> Index range scan on ca (re-planned for each iteration)(cost=1199.51 rows=100033) (actual time=0.032..162.289 rows=100000 loops=207)

1 row in set, 7 warnings (1 min 26.19 sec)
</code></pre>
<p>SQL总体耗时86s，两个标量子查询，执行计划显示其扫描方式为： <code>Index range scan on ca (re-planned for each iteration)</code>，优化器在SQL执行过程中重新规划执行计划，实际每次扫描行数为100000，为全表的行数，<strong>这表示并没有用到索引来定位，每次都是全表扫描，单次耗时超过160ms</strong>，很显然，这里不符合期待，子查询效率低，执行多次导致SQL整体性能差。</p>
<p>子查询为什么会全表扫描呢，从建表语句上可以看到<code>accepttime</code> 列上是有索引的，根据语义可以推断出，子查询只需要查询一天的数据，如果能用上索引，SQL整体性能一下子就能提升上来。推测<strong>优化器处理非等值关联的标量子查询时，可能算法上存在一定缺陷</strong>。</p>
<p>那既然明确了SQL怎样执行效率会高，现在就是想办法让SQL按照自己指定的执行计划来执行。</p>
<h3 id="优化方案">优化方案</h3>
<p>那么怎样才能让子查询用到索引呢，在这里我想到了借助 FUNCTION 来实现，因为 FUNCTION 内部可以当作一个独立的SQL来执行，相当于对原SQL进行了拆分。</p>
<p>下面创建两个简单的 FUNCTION，来完成两个标量子查询的功能。</p>
<pre><code class="language-SQL">DELIMITER //
CREATE OR REPLACE FUNCTION getcntbyall(v_date varchar(20),v_keyword varchar(50))
RETURN INT
IS
cnt INT:=0;
BEGIN
SELECT COUNT(1) INTO cnt
FROM t2 ca
WHERE ca.ACCEPTTIME >=
   concat(v_date, ' 00:00:00')
AND ca.ACCEPTTIME <=
   concat(v_date, ' 23:59:59')
AND instr(ca.key_word, v_keyword) > 0;
RETURN cnt;
END;
//
DELIMITER ;

DELIMITER //
CREATE OR REPLACE FUNCTION getcntbycity(v_date varchar(20),v_keyword varchar(50),v_city varchar(50) )
RETURN INT
IS
cnt INT:=0;
BEGIN
SELECT COUNT(1) INTO cnt
FROM t2 ca
WHERE ca.ACCEPTTIME >=
   concat(v_date, ' 00:00:00')
AND ca.ACCEPTTIME <=
   concat(v_date, ' 23:59:59')
AND instr(ca.key_word, v_keyword) > 0
AND instr(ca.city_, v_city) > 0;
RETURN cnt;
END;
//
DELIMITER ;
</code></pre>
<p>语句改写如下：</p>
<pre><code class="language-SQL">SELECT (CASE
      WHEN aa.alert_type = '8' AND aa.alarm_geotype = '全市' THEN
      (concat('',
               getcntbyall(aa.alert_date,aa.key_word)))
      WHEN aa.alert_type = '8' AND aa.alarm_geotype = '区县' THEN
      (concat('',
               getcntbycity(aa.alert_date,aa.key_word,aa.city_)))
   END) alert_value
FROM t1 aa
WHERE aa.ALERT_DATE >= '2025-05-15'
AND aa.ALERT_DATE <= '2025-06-15'
</code></pre>
<p>执行计划如下：</p>
<pre><code class="language-SQL">*************************** 1. row ***************************
EXPLAIN: -> Index range scan on aa using idx_alertdate over ('2025-05-15' <= alert_date <= '2025-06-15'), with index condition: ((aa.alert_date >= '2025-05-15') and (aa.alert_date <= '2025-06-15'))(cost=178.01 rows=395) (actual time=0.300..10.349 rows=395 loops=1)

1 row in set (9.53 sec)
</code></pre>
<p>FUNCTION 的执行计划并不会在主体计划中显示，但是从总体耗时来看，使用了FUNCTION 总体耗时9.5s，比原SQL性能(86s)提升了9倍。为什么使用FUNCTION 能提升SQL的执行效率，<strong>这是因为FUNCTION 内部SQL执行时使用了索引</strong>。下面为 FUNCATION 内部语句的执行计划。</p>
<pre><code class="language-SQL">greatsql> explain analyze
-> SELECT count(1)
->FROM t2 ca
-> WHERE ca.ACCEPTTIME >=
->    concat('2025-05-30', ' 00:00:00')
-> AND ca.ACCEPTTIME <=
->    concat('2025-05-30', ' 23:59:59')
-> AND instr(ca.key_word, 'a') > 0\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(1)(cost=1768.51 rows=1) (actual time=15.010..15.010 rows=1 loops=1)
-> Filter: (instr(ca.key_word,'a') > 0)(cost=1447.01 rows=3215) (actual time=0.069..14.607 rows=3215 loops=1)
   -> Index range scan on ca using idx_accepttime over ('2025-05-30 00:00:00' <= accepttime <= '2025-05-30 23:59:59'), with index condition: ((ca.accepttime >= <cache>(concat('2025-05-30',' 00:00:00'))) and (ca.accepttime <= <cache>(concat('2025-05-30',' 23:59:59'))))(cost=1447.01 rows=3215) (actual time=0.059..12.758 rows=3215 loops=1)

1 row in set (0.02 sec)
</code></pre>
<p>原标量子查询，单次执行耗时约160ms，而FUNCTION内部单次执行耗时15ms，因此SQL整体性能得到很大的提升。</p>
<h2 id="优化总结">优化总结</h2>
<p>优化器给出的执行计划不符合预期时，可以有目的的改写SQL，使其按照既定的高效计划来执行，此处标量子查询使用了聚合函数，且非等值条件关联，无法改为外连接，选择使用FUNCTION来独立部分SQL，调整了执行计划，起到了很好的调优效果。</p>
<p>所谓SQL改写，并不是随便的来尝试，首先改写人要知道执行计划怎样走才能高效，其次就是为了要实现想要的执行计划，去调整SQL的写法来达到自己的目的。</p>
<hr>
<p>Enjoy GreatSQL 😃</p>
<h2 id="关于-greatsql">关于 GreatSQL</h2>
<p>GreatSQL是适用于金融级应用的国内自主开源数据库，具备高性能、高可靠、高易用性、高安全等多个核心特性，可以作为MySQL或Percona Server的可选替换，用于线上生产环境，且完全免费并兼容MySQL或Percona Server。</p>
<p>相关链接： GreatSQL社区 Gitee GitHub Bilibili</p>
<h2 id="greatsql社区">GreatSQL社区：</h2>
<blockquote>
<p>社区博客有奖征稿详情：https://greatsql.cn/thread-100-1-1.html</p>
</blockquote>
<p><img src="https://img2024.cnblogs.com/other/2630741/202507/2630741-20250716102655395-567589704.png"></p>
<h2 id="技术交流群">技术交流群：</h2>
<blockquote>
<p>微信：扫码添加<code>GreatSQL社区助手</code>微信好友，发送验证信息<code>加群</code>。</p>
</blockquote>
<p><img src="https://img2024.cnblogs.com/other/2630741/202507/2630741-20250716102655665-444690619.png"></p><br><br>
来源：https://www.cnblogs.com/greatsql/p/18986989

頁: [1]

圆梦公社's Archiver

GreatSQL优化技巧：使用 FUNCTION 代替标量子查询