PostgreSQL 六大索引的方法小结

嘎嘣豆儿 發表於 2025-11-12 10:31:58

<div id="navCategory"><h5 class="catalogue">目录</h5><ul class="first_class_ul"><li><a href="#_label0">一、概览速览表（先有直觉）</a></li><li><a href="#_label1">二、B-tree（默认 & 通吃）</a></li><li><a href="#_label2">三、Hash（只做“等值”）</a></li><li><a href="#_label3">四、GIN（倒排：jsonb/数组/全文/trigram）</a></li><li><a href="#_label4">五、GiST（“通用”搜索树：空间/范围/KNN/排斥约束）</a></li><li><a href="#_label5">六、SP-GiST（空间分割/前缀）</a></li><li><a href="#_label6">七、BRIN（Block Range Index：大表“顺序相关”神器）</a></li><li><a href="#_label7">八、选型决策 10 条军规</a></li><li><a href="#_label8">九、常见坑与对症下药</a></li><li><a href="#_label9">十、实战模板：一张订单表怎么配索引</a></li><li><a href="#_label10">十一、性能与维护清单</a></li><li><a href="#_label11">十二、快速对照：你在查什么，就选什么</a></li></ul></div><p class="maodian"><a name="_label0"></a></p><h2>一、概览速览表（先有直觉）</h2>
<table><thead><tr><th>索引类型</th><th>典型用途</th><th>支持唯一</th><th>适配查询</th><th>优缺点一眼看</th></tr></thead><tbody><tr><td>B-tree</td><td>等值、范围、排序、唯一约束</td><td>✅</td><td>=, <, >, BETWEEN, ORDER BY</td><td>默认首选，通用；维护成本中等</td></tr><tr><td>Hash</td><td>纯等值匹配</td><td>❌（唯一约束由 B-tree 实现）</td><td>=</td><td>仅等值；自 PG10 起 WAL 持久化；使用面窄</td></tr><tr><td>GIN</td><td>倒排类：jsonb、数组、全文、trigram</td><td>❌</td><td>@>, ?, ?&, @@(全文)，LIKE/ILIKE(trigram)</td><td>读快写慢，适合查询多、更新少</td></tr><tr><td>GiST</td><td>距离/范围/空间/相似度、KNN</td><td>❌</td><td>范围、相交、<-> KNN</td><td>通用“框架”，支持多种数据类型（几何、range、inet…）</td></tr><tr><td>SP-GiST</td><td>前缀/空间分割（trie/k-d/四叉树）</td><td>❌</td><td>前缀、某些 KNN</td><td>适合强分割数据（前缀搜索、坐标）</td></tr><tr><td>BRIN</td><td>超大表顺序相关列（时间/自增ID）</td><td>❌</td><td>大范围扫描的快速剪枝</td><td>体积极小，建立/维护极轻；精度低需回检</td></tr></tbody></table>
<blockquote><p>记忆法：“B 通吃、H 等值、GIN 倒排、GiST 空间、SPG 前缀、BRIN 顺序”</p></blockquote>
<p class="maodian"><a name="_label1"></a></p><h2>二、B-tree（默认 & 通吃）</h2>
<p><strong>特性</strong></p>
<ul><li>PostgreSQL 的 <code>UNIQUE</code>/主键约束本质上都是 B-tree 索引。</li><li>适配等值/范围/排序/聚合的常见访问路径与 <code>ORDER BY … LIMIT</code>。</li><li>支持<strong>覆盖索引</strong>（<code>INCLUDE</code>，用于只读回表字段；主要用于 B-tree）。</li></ul>
<p><strong>常用语法</strong></p>
<div class="jb51code"><pre class="brush:sql;">-- 单列 / 多列
CREATE INDEX idx_user_email ON users(email);
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

-- 覆盖索引（减少回表）
CREATE INDEX idx_orders_status_created_inc
ON orders(status, created_at)
INCLUDE (amount);

-- 表达式/部分索引
CREATE INDEX idx_lower_email ON users (lower(email));
CREATE INDEX idx_paid_recent ON orders (created_at)
WHERE status = 'PAID';
</pre></div>
<p><strong>实战要点</strong></p>
<ul><li>多列 B-tree 的<strong>左前缀原则</strong>：查询条件要尽量命中前导列。</li><li>大量 <code>LIKE 'abc%'</code> 的<strong>前缀匹配</strong>通常也可命中 B-tree；但包含式 <code>%abc%</code> 需 <code>pg_trgm</code>。</li><li>注意 <strong>NULL</strong>：唯一约束允许多行 NULL。</li></ul>
<p class="maodian"><a name="_label2"></a></p><h2>三、Hash（只做“等值”）</h2>
<p><strong>场景</strong>：极端等值查找且键宽/比较代价特殊时，可能略小/略快。<br /><strong>限制</strong>：不支持唯一约束与范围/排序；面窄，<strong>大多数等值场景仍用 B-tree</strong>。</p>
<div class="jb51code"><pre class="brush:sql;">CREATE INDEX idx_users_hash_email ON users USING hash(email);
</pre></div>
<blockquote><p>备注：自 <strong>PG10 起 Hash 索引 WAL 持久化</strong>，可崩溃恢复，但优势有限。</p></blockquote>
<p class="maodian"><a name="_label3"></a></p><h2>四、GIN（倒排：jsonb/数组/全文/trigram）</h2>
<p><strong>场景</strong></p>
<ul><li><code>jsonb</code> 包含/键值查询：<code>@></code>, <code>?</code>, <code>?&</code>；</li><li>数组元素包含/交集；</li><li>全文检索 <code>to_tsvector(...) @@ to_tsquery(...)</code>；</li><li>模糊查询：<code>pg_trgm</code> 的 trigram + <code>LIKE/ILIKE '%abc%'</code>。</li></ul>
<p><strong>语法与 opclass</strong></p>
<div class="jb51code"><pre class="brush:sql;">-- jsonb：两种常用 opclass
CREATE INDEX idx_doc_gin ON docs USING gin(data jsonb_ops);    -- 全功能，体积偏大
CREATE INDEX idx_doc_path ON docs USING gin(data jsonb_path_ops);-- 对 @> 优化更好

-- 数组包含
CREATE INDEX idx_tags_gin ON posts USING gin(tags);

-- 全文
CREATE INDEX idx_posts_fts ON posts USING gin(to_tsvector('simple', title || ' ' || body));

-- trigram 模糊（需扩展）
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_user_name_trgm ON users USING gin(name gin_trgm_ops);
</pre></div>
<p><strong>调优</strong></p>
<div class="jb51code"><pre class="brush:sql;">-- 批量重放/更新多时：构建或维护参数
CREATE INDEX ... USING gin(...) WITH (fastupdate = on, gin_pending_list_limit = '512MB');
</pre></div>
<ul><li><strong>优点</strong>：读极快（特别是包含/全文/模糊）；</li><li><strong>缺点</strong>：写慢、体积大；更适合“读多写少”或批量导入。</li></ul>
<p class="maodian"><a name="_label4"></a></p><h2>五、GiST（“通用”搜索树：空间/范围/KNN/排斥约束）</h2>
<p><strong>场景</strong></p>
<ul><li>几何/地理（PostGIS）、<code>range</code>、<code>inet/cidr</code>、相交/包含等；</li><li>KNN 最近邻查询：<code>ORDER BY <-></code>；</li><li><strong>排斥约束（Exclusion Constraint）</strong>：避免时间段/空间重叠（常配合 <code>btree_gist</code>）。</li></ul>
<p><strong>示例</strong></p>
<div class="jb51code"><pre class="brush:sql;">-- 范围不重叠的预约（时间区间）
CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE booking(
room_id int,
during tstzrange,
EXCLUDE USING gist (room_id WITH =, during WITH &&)-- 禁止同房间时间相交
);

-- KNN：最近地点
CREATE INDEX idx_poi_gist ON poi USING gist (geom);
SELECT * FROM poi ORDER BY geom <-> ST_SetSRID(ST_MakePoint(121.5,31.2), 4326) LIMIT 20;
</pre></div>
<p><strong>要点</strong></p>
<ul><li>GiST 是“框架”，性能取决于具体 <strong>operator class</strong>（几何、range 等）。</li><li>构建大索引可：<code>WITH (buffering = on)</code>；范围型查询常见“回检”。</li></ul>
<p class="maodian"><a name="_label5"></a></p><h2>六、SP-GiST（空间分割/前缀）</h2>
<p><strong>场景</strong></p>
<ul><li>前缀类数据（如手机号/URL/域名前缀）基于 trie；</li><li>坐标点的 k-d/四叉树等强分割结构。</li></ul>
<div class="jb51code"><pre class="brush:sql;">-- 前缀查询
CREATE INDEX idx_phone_prefix ON users USING spgist (phone_number);
SELECT * FROM users WHERE phone_number LIKE '1389%';
</pre></div>
<p><strong>特点</strong>：对<strong>分布不均匀</strong>且可<strong>递归分割</strong>的数据更友好；更新/插入性能通常优于 GiST 的某些场景。</p>
<p class="maodian"><a name="_label6"></a></p><h2>七、BRIN（Block Range Index：大表“顺序相关”神器）</h2>
<p><strong>场景</strong></p>
<ul><li>超大表（千万/亿级），<code>created_at</code>、自增 <code>id</code> 与物理顺序<strong>相关性高</strong>；</li><li>大范围查询或分段扫描，BRIN 能以极小代价快速缩小扫描页面。</li></ul>
<div class="jb51code"><pre class="brush:sql;">-- 典型配置：pages_per_range 控制摘要粒度（越小越精细）
CREATE INDEX idx_orders_brin_created
ON orders USING brin (created_at)
WITH (pages_per_range = 128, autosummarize = on);
</pre></div>
<p><strong>要点</strong></p>
<ul><li>BRIN 只存储每个范围的 min/max 等摘要，<strong>需要回表回检</strong>；</li><li>体积/维护成本极低，适合“<strong>追加写</strong> + <strong>时间窗口查询</strong>”；</li><li>相关性弱（数据经常乱序写入）时效果下降，可 <code>CLUSTER</code>/重写表优化物理顺序。</li></ul>
<p class="maodian"><a name="_label7"></a></p><h2>八、选型决策 10 条军规</h2>
<ol><li><strong>能用 B-tree 先用 B-tree</strong>：等值/范围/排序/唯一都稳。</li><li><code>jsonb/数组/全文/模糊</code> → <strong>GIN</strong>；其中 <code>LIKE '%abc%'</code> 强烈建议 <strong>trigram + GIN</strong>。</li><li>距离/空间/范围相交/KNN → <strong>GiST</strong>（PostGIS、range、inet 等）。</li><li>前缀或可分割空间结构 → <strong>SP-GiST</strong>。</li><li>超大追加型时间/ID查询 → <strong>BRIN</strong>。</li><li>纯等值且确有收益证据 → <strong>Hash</strong>；否则 B-tree。</li><li>高频过滤 + 低选择度 → <strong>部分索引</strong>（<code>WHERE ...</code>）胜过大而全。</li><li>只读回表字段较多 → B-tree <strong>INCLUDE</strong> 做覆盖扫描。</li><li>表达式要<strong>索引同款表达式</strong>（如 <code>lower(email)</code>）；否则无法命中。</li><li>多列顺序要<strong>按查询使用频次/选择度从左到右</strong>排列；避免“全吃不着”的复合索引。</li></ol>
<p class="maodian"><a name="_label8"></a></p><h2>九、常见坑与对症下药</h2>
<ul><li><strong>模糊查询没走索引</strong>：<code>LIKE '%abc%'</code> 需 <code>pg_trgm</code> + <code>GIN/GiST</code>；<code>LIKE 'abc%'</code> 可走 B-tree。</li><li><strong>大小写不敏感</strong>：<code>lower(col)</code> 表达式索引 + 查询同写法；或用 <code>citext</code> 类型。</li><li><strong>多列索引未命中</strong>：条件没用到<strong>前导列</strong>；或使用了不等价的表达式/函数。</li><li><strong>jsonb 慢</strong>：选对 opclass：<code>jsonb_path_ops</code> 对 <code>@></code> 更紧凑；更新频繁则谨慎使用 GIN。</li><li><strong>索引暴胀</strong>：定期 <code>VACUUM</code>，必要时 <code>REINDEX</code>；控制 <code>fillfactor</code>。</li><li><strong>计划不稳定</strong>：检查统计信息与相关性（<code>ANALYZE</code>、<code>default_statistics_target</code>）；利用 <code>EXPLAIN (ANALYZE, BUFFERS)</code> 诊断。</li></ul>
<p class="maodian"><a name="_label9"></a></p><h2>十、实战模板：一张订单表怎么配索引</h2>
<div class="jb51code"><pre class="brush:sql;">CREATE TABLE orders(
id       bigserial PRIMARY KEY,
user_id    bigint NOT NULL,
status    text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
amount    numeric(12,2) NOT NULL,
items    jsonb,          -- 订单明细（jsonb）
tags       text[]          -- 标签
);

-- 1) 用户最近订单（分页/排序）
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

-- 2) 状态 + 时间窗口统计（覆盖金额）
CREATE INDEX idx_orders_status_created_inc
ON orders(status, created_at)
INCLUDE (amount);

-- 3) jsonb 包含查找（items 内含某 SKU）
CREATE INDEX idx_orders_items_path ON orders USING gin(items jsonb_path_ops);

-- 4) 标签包含任一/全部
CREATE INDEX idx_orders_tags_gin ON orders USING gin(tags);

-- 5) 超大表时间过滤的剪枝
CREATE INDEX idx_orders_brin_created
ON orders USING brin (created_at) WITH (pages_per_range=128, autosummarize=on);
</pre></div>
<p class="maodian"><a name="_label10"></a></p><h2>十一、性能与维护清单</h2>
<ul><li><strong>分析与观测</strong>：<code>EXPLAIN (ANALYZE, BUFFERS)</code>, <code>pg_stat_statements</code>, <code>pg_stat_all_indexes</code>。</li><li><strong>维护</strong>：<code>VACUUM (ANALYZE)</code>、高写入期适当增大 <code>maintenance_work_mem</code>；大索引可并行创建：<code>CREATE INDEX CONCURRENTLY</code>（无锁长事务，但更慢）。</li><li><strong>参数提示</strong>：顺序 I/O 多可调低 <code>random_page_cost</code>；SSD 环境可适当下调以提高索引倾向。</li><li><strong>物理顺序</strong>：时间序列表可偶尔 <code>CLUSTER</code> 或重写，提高 BRIN/JIT 效果与热点局部性。</li></ul>
<p class="maodian"><a name="_label11"></a></p><h2>十二、快速对照：你在查什么，就选什么</h2>
<ul><li><code>WHERE a = ?</code> / <code>ORDER BY a</code> / <code>BETWEEN</code> → <strong>B-tree</strong></li><li><code>WHERE col @> '{"k":"v"}'::jsonb</code> / <code>tags @> '{x}'</code> → <strong>GIN(jsonb/数组)</strong></li><li><code>title @@ to_tsquery('...')</code> → <strong>GIN(全文)</strong></li><li><code>name ILIKE '%abc%'</code> → <strong>GIN + pg_trgm</strong></li><li><code>geom <-> point</code> 最近点 → <strong>GiST + KNN</strong></li><li><code>tsrange && ?</code> 不重叠预约 → <strong>GiST + 排斥约束</strong></li><li><code>phone LIKE '1389%'</code> → <strong>SP-GiST（或 B-tree 也可）</strong></li><li><code>WHERE created_at BETWEEN ...</code>（亿级表） → <strong>BRIN</strong></li></ul>

頁: [1]

圆梦公社's Archiver

PostgreSQL 六大索引的方法小结