索引文本字段上的PostgreSQL查询性能

蒸了

您好,资深工程师,

我有一个性能不佳的查询,并认为添加索引将是一个快速简便的解决方案,会对性能产生积极的影响。

看来我错了,所以我在这里请教您。

该表实际上非常简单,包含约10,000个条目,看起来像这样:

CREATE TABLE phrase
(
    phrase_id bigint NOT NULL,
    phrase text COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT phrase_pkey PRIMARY KEY (phrase_id),
    CONSTRAINT phrase_phrase_key UNIQUE (phrase)
)

删除/创建索引后,在运行任何解释/查询之前,我在表上运行了VACUUM ANALYZE。

运行以下没有索引的查询大约需要82毫秒,这是几次尝试的大致平均值。

EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase  (cost=0.00..295.72 rows=100 width=18) (actual time=0.049..5.730 rows=2 loops=1)"
"  Filter: (upper(phrase) = ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"  Rows Removed by Filter: 10046"
"Planning Time: 0.135 ms"
"Execution Time: 5.745 ms"

可能有一些短语需要拉回去,所以我在查看ILIKE时如下:

EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) ~~* ANY('{*PROTEIN SHAKE,*APPLE PIE}');;
"Seq Scan on phrase  (cost=0.00..295.72 rows=2 width=18) (actual time=0.113..19.492 rows=2 loops=1)"
"  Filter: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"  Rows Removed by Filter: 10046"
"Planning Time: 0.081 ms"
"Execution Time: 19.510 ms"

创建像这样的btree索引后:

CREATE INDEX phrase_phrase_idx ON phrase USING BTREE (upper(phrase) text_pattern_ops ASC NULLS LAST)

等式查询给出以下解释;运行此查询大约需要85毫秒。

EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Index Only Scan using phrase_phrase_btree_idx on phrase  (cost=0.29..8.61 rows=2 width=18) (actual time=0.031..0.034 rows=2 loops=1)"
"  Index Cond: ((upper(phrase)) = ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"  Heap Fetches: 0"
"Planning Time: 0.100 ms"
"Execution Time: 0.112 ms"

ILIKE查询给出以下解释;运行此查询大约需要95毫秒。

EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) ~~* ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase  (cost=0.00..295.72 rows=2 width=18) (actual time=0.116..19.043 rows=2 loops=1)"
"  Filter: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"  Rows Removed by Filter: 10046"
"Planning Time: 0.247 ms"
"Execution Time: 19.060 ms"

我认为可能值得尝试将值保留在索引中,以节省前往表的时间:

CREATE INDEX phrase_phrase_idx ON phrase USING BTREE (upper(phrase) text_pattern_ops ASC NULLS LAST) INCLUDE(phrase_id, phrase);

运行以下查询(而非说明)大约需要85毫秒。

EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase  (cost=0.00..295.72 rows=100 width=18) (actual time=0.046..5.734 rows=2 loops=1)"
"  Filter: (upper(phrase) = ANY ('{"*)"","*APPLE PIE"}'::text[]))"
"  Rows Removed by Filter: 10046"
"Planning Time: 0.139 ms"
"Execution Time: 5.748 ms"

这个ILIKE查询大约花费了95毫秒。

EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) ~~* ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase  (cost=0.00..295.72 rows=2 width=18) (actual time=0.114..19.340 rows=2 loops=1)"
"  Filter: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"  Rows Removed by Filter: 10046"
"Planning Time: 0.094 ms"
"Execution Time: 19.357 ms"

我以为我会尝试GIN索引:

CREATE INDEX phrase_phrase_gin_idx ON phrase USING GIN (upper(phrase) gin_trgm_ops)

相等查询大约需要85毫秒。

EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Seq Scan on phrase  (cost=0.00..295.72 rows=2 width=18) (actual time=0.056..5.764 rows=2 loops=1)"
"  Filter: (upper(phrase) = ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"  Rows Removed by Filter: 10046"
"Planning Time: 0.059 ms"
"Execution Time: 5.775 ms"

ILIKE查询也花费了大约85ms。

EXPLAIN ANALYSE SELECT phrase FROM phrase WHERE upper(phrase) ~~* ANY('{*PROTEIN SHAKE,*APPLE PIE}');
"Bitmap Heap Scan on phrase  (cost=148.02..155.34 rows=2 width=18) (actual time=0.264..0.275 rows=2 loops=1)"
"  Recheck Cond: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"  Rows Removed by Index Recheck: 2"
"  Heap Blocks: exact=3"
"  ->  Bitmap Index Scan on phrase_phrase_gin_idx  (cost=0.00..148.02 rows=2 width=0) (actual time=0.247..0.247 rows=4 loops=1)"
"        Index Cond: (upper(phrase) ~~* ANY ('{"*PROTEIN SHAKE","*APPLE PIE"}'::text[]))"
"Planning Time: 0.281 ms"
"Execution Time: 0.314 ms"

我不确定自己在做什么错,但是我索引的方式看不出太多好处。

请有人足够友善地指出我在哪里/为什么要脾气暴躁。

我希望能够对此查询做出重大改变,但是我缺少了一些东西。

致以最诚挚的问候,保罗。

劳伦兹·阿尔伯

虽然Trigram索引是一个功能强大的工具,但它们更新缓慢且会变得很大,因此,如果可以使用B树索引,请务必使用它。

您必须测试Trigram索引是否可以为IN查询中的较长列表提供性能优势,但是我怀疑这样做是否可以。

我会坚持原来的

WHERE upper(phrase) = ANY('{*PROTEIN SHAKE,*APPLE PIE}')

和B树索引。无论您使用哪个索引,使用ILIKE ANY代替代替= ANY都不会带来任何好处。

在某个时候,如果列表很长,PostgreSQL将切换到顺序扫描,但是假设要扫描的表部分足够大以至于索引扫描不会再快了,那么也许是正确的。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章