您好,资深工程师,
我有一个性能不佳的查询,并认为添加索引将是一个快速简便的解决方案,会对性能产生积极的影响。
看来我错了,所以我在这里请教您。
该表实际上非常简单,包含约10,000个条目,看起来像这样:
CREATE TABLE phrase
(
phrase_id bigint NOT NULL,
phrase text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT phrase_pkey PRIMARY KEY (phrase_id),
CONSTRAINT phrase_phrase_key UNIQUE (phrase)
)
删除/创建索引后,在运行任何解释/查询之前,我在表上运行了VACUUM ANALYZE。
运行以下没有索引的查询大约需要82毫秒,这是几次尝试的大致平均值。
EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase (cost=0.00..295.72 rows=100 width=18) (actual time=0.049..5.730 rows=2 loops=1)"
" Filter: (upper(phrase) = ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
" Rows Removed by Filter: 10046"
"Planning Time: 0.135 ms"
"Execution Time: 5.745 ms"
可能有一些短语需要拉回去,所以我在查看ILIKE时如下:
EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) ~~* ANY('{*PROTEIN SHAKE,*APPLE PIE}');;
"Seq Scan on phrase (cost=0.00..295.72 rows=2 width=18) (actual time=0.113..19.492 rows=2 loops=1)"
" Filter: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
" Rows Removed by Filter: 10046"
"Planning Time: 0.081 ms"
"Execution Time: 19.510 ms"
创建像这样的btree索引后:
CREATE INDEX phrase_phrase_idx ON phrase USING BTREE (upper(phrase) text_pattern_ops ASC NULLS LAST)
等式查询给出以下解释;运行此查询大约需要85毫秒。
EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Index Only Scan using phrase_phrase_btree_idx on phrase (cost=0.29..8.61 rows=2 width=18) (actual time=0.031..0.034 rows=2 loops=1)"
" Index Cond: ((upper(phrase)) = ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
" Heap Fetches: 0"
"Planning Time: 0.100 ms"
"Execution Time: 0.112 ms"
ILIKE查询给出以下解释;运行此查询大约需要95毫秒。
EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) ~~* ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase (cost=0.00..295.72 rows=2 width=18) (actual time=0.116..19.043 rows=2 loops=1)"
" Filter: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
" Rows Removed by Filter: 10046"
"Planning Time: 0.247 ms"
"Execution Time: 19.060 ms"
我认为可能值得尝试将值保留在索引中,以节省前往表的时间:
CREATE INDEX phrase_phrase_idx ON phrase USING BTREE (upper(phrase) text_pattern_ops ASC NULLS LAST) INCLUDE(phrase_id, phrase);
运行以下查询(而非说明)大约需要85毫秒。
EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase (cost=0.00..295.72 rows=100 width=18) (actual time=0.046..5.734 rows=2 loops=1)"
" Filter: (upper(phrase) = ANY ('{"*)"","*APPLE PIE"}'::text[]))"
" Rows Removed by Filter: 10046"
"Planning Time: 0.139 ms"
"Execution Time: 5.748 ms"
这个ILIKE查询大约花费了95毫秒。
EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) ~~* ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase (cost=0.00..295.72 rows=2 width=18) (actual time=0.114..19.340 rows=2 loops=1)"
" Filter: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
" Rows Removed by Filter: 10046"
"Planning Time: 0.094 ms"
"Execution Time: 19.357 ms"
我以为我会尝试GIN索引:
CREATE INDEX phrase_phrase_gin_idx ON phrase USING GIN (upper(phrase) gin_trgm_ops)
相等查询大约需要85毫秒。
EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase (cost=0.00..295.72 rows=2 width=18) (actual time=0.056..5.764 rows=2 loops=1)"
" Filter: (upper(phrase) = ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
" Rows Removed by Filter: 10046"
"Planning Time: 0.059 ms"
"Execution Time: 5.775 ms"
ILIKE查询也花费了大约85ms。
EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) ~~* ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Bitmap Heap Scan on phrase (cost=148.02..155.34 rows=2 width=18) (actual time=0.264..0.275 rows=2 loops=1)"
" Recheck Cond: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
" Rows Removed by Index Recheck: 2"
" Heap Blocks: exact=3"
" -> Bitmap Index Scan on phrase_phrase_gin_idx (cost=0.00..148.02 rows=2 width=0) (actual time=0.247..0.247 rows=4 loops=1)"
" Index Cond: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"Planning Time: 0.281 ms"
"Execution Time: 0.314 ms"
我不确定自己在做什么错,但是我索引的方式看不出太多好处。
请有人足够友善地指出我在哪里/为什么要脾气暴躁。
我希望能够对此查询做出重大改变,但是我缺少了一些东西。
致以最诚挚的问候,保罗。
虽然Trigram索引是一个功能强大的工具,但它们更新缓慢且会变得很大,因此,如果可以使用B树索引,请务必使用它。
您必须测试Trigram索引是否可以为IN
查询中的较长列表提供性能优势,但是我怀疑这样做是否可以。
我会坚持原来的
WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}')
和B树索引。无论您使用哪个索引,使用ILIKE ANY
代替代替= ANY
都不会带来任何好处。
在某个时候,如果列表很长,PostgreSQL将切换到顺序扫描,但是假设要扫描的表部分足够大以至于索引扫描不会再快了,那么也许是正确的。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句