postgres两列排序性能低下

罗布斯

我有一个查询,执行多个联接。我尝试只获取结果中最新的每个关键字的位置。

这是查询:

SELECT DISTINCT ON (p.keyword_id)
  a.id        AS account_id,
  w.parent_id AS parent_id,
  w.name      AS name,
  p.position  AS position
FROM websites w
  JOIN accounts a ON w.account_id = a.id
  JOIN keywords k ON k.website_id = w.parent_id
  JOIN positions p ON p.website_id = w.parent_id
WHERE a.amount > 0 AND w.parent_id NOTNULL AND (round((a.amount / a.payment_renewal_period), 2) BETWEEN 1 AND 19)
ORDER BY p.keyword_id, p.created_at DESC;

该查询的费用计划如下:

   Unique  (cost=73673.65..76630.38 rows=264 width=40) (actual time=30777.117..49143.023 rows=259 loops=1)
       ->  Sort  (cost=73673.65..75152.02 rows=591347 width=40) (actual time=30777.116..47352.373 rows=10891486 loops=1)
             Sort Key: p.keyword_id, p.created_at DESC
             Sort Method: external merge  Disk: 512672kB
             ->  Merge Join  (cost=219.59..812.26 rows=591347 width=40) (actual time=3.487..3827.028 rows=10891486 loops=1)
                   Merge Cond: (w.parent_id = k.website_id)
                   ->  Nested Loop  (cost=128.46..597.73 rows=1268 width=44) (actual time=3.378..108.915 rows=61582 loops=1)
                         ->  Nested Loop  (cost=2.28..39.86 rows=1 width=28) (actual time=0.026..0.216 rows=7 loops=1)
                               ->  Index Scan using index_websites_on_parent_id on websites w  (cost=0.14..15.08 rows=4 width=28) (actual time=0.004..0.023 rows=7 loops=1)
                                     Index Cond: (parent_id IS NOT NULL)
                               ->  Bitmap Heap Scan on accounts a  (cost=2.15..6.18 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=7)
                                     Recheck Cond: (id = w.account_id)
                                     Filter: ((amount > '0'::numeric) AND (round((amount / (payment_renewal_period)::numeric), 2) >= '1'::numeric) AND (round((amount / (payment_renewal_period)::numeric), 2) <= '19'::numeric))
                                     Heap Blocks: exact=7
                                     ->  Bitmap Index Scan on accounts_pkey  (cost=0.00..2.15 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=7)
                                           Index Cond: (id = w.account_id)
                         ->  Bitmap Heap Scan on positions p  (cost=126.18..511.57 rows=4631 width=16) (actual time=0.994..8.226 rows=8797 loops=7)
                               Recheck Cond: (website_id = w.parent_id)
                               Heap Blocks: exact=1004
                               ->  Bitmap Index Scan on index_positions_on_5_columns  (cost=0.00..125.02 rows=4631 width=0) (actual time=0.965..0.965 rows=8797 loops=7)
                                     Index Cond: (website_id = w.parent_id)
                   ->  Sort  (cost=18.26..18.92 rows=264 width=4) (actual time=0.106..1013.966 rows=10891487 loops=1)
                         Sort Key: k.website_id
                         Sort Method: quicksort  Memory: 37kB
                         ->  Seq Scan on keywords k  (cost=0.00..7.64 rows=264 width=4) (actual time=0.005..0.039 rows=263 loops=1)
     Planning time: 1.081 ms
     Execution time: 49184.222 ms

问题是当我运行查询w.id而不是w.parent_id在联接位置时,总成本降低到

Unique  (cost=3621.07..3804.99 rows=264 width=40) (actual time=128.430..139.550 rows=259 loops=1)
   ->  Sort  (cost=3621.07..3713.03 rows=36784 width=40) (actual time=128.429..135.444 rows=40385 loops=1)
         Sort Key: p.keyword_id, p.created_at DESC
         Sort Method: external sort  Disk: 2000kB
         ->  Merge Join  (cost=128.73..831.59 rows=36784 width=40) (actual time=25.521..63.299 rows=40385 loops=1)
               Merge Cond: (k.website_id = w.id)
               ->  Index Only Scan using index_keywords_on_website_id_deleted_at on keywords k  (cost=0.27..24.23 rows=264 width=4) (actual time=0.137..0.274 rows=263 loops=1)
                     Heap Fetches: 156
               ->  Materialize  (cost=128.46..606.85 rows=1268 width=44) (actual time=3.772..49.587 rows=72242 loops=1)
                     ->  Nested Loop  (cost=128.46..603.68 rows=1268 width=44) (actual time=3.769..30.530 rows=61582 loops=1)
                           ->  Nested Loop  (cost=2.28..45.80 rows=1 width=32) (actual time=0.047..0.204 rows=7 loops=1)
                                 ->  Index Scan using websites_pkey on websites w  (cost=0.14..21.03 rows=4 width=32) (actual time=0.007..0.026 rows=7 loops=1)
                                       Filter: (parent_id IS NOT NULL)
                                       Rows Removed by Filter: 4
                                 ->  Bitmap Heap Scan on accounts a  (cost=2.15..6.18 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=7)
                                       Recheck Cond: (id = w.account_id)
                                       Filter: ((amount > '0'::numeric) AND (round((amount / (payment_renewal_period)::numeric), 2) >= '1'::numeric) AND (round((amount / (payment_renewal_period)::numeric), 2) <= '19'::numeric))
                                       Heap Blocks: exact=7
                                       ->  Bitmap Index Scan on accounts_pkey  (cost=0.00..2.15 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=7)
                                             Index Cond: (id = w.account_id)
                           ->  Bitmap Heap Scan on positions p  (cost=126.18..511.57 rows=4631 width=16) (actual time=0.930..2.341 rows=8797 loops=7)
                                 Recheck Cond: (website_id = w.parent_id)
                                 Heap Blocks: exact=1004
                                 ->  Bitmap Index Scan on index_positions_on_5_columns  (cost=0.00..125.02 rows=4631 width=0) (actual time=0.906..0.906 rows=8797 loops=7)
                                       Index Cond: (website_id = w.parent_id)
 Planning time: 1.124 ms
 Execution time: 157.167 ms

网站索引

Indexes:
    "websites_pkey" PRIMARY KEY, btree (id)
    "index_websites_on_account_id" btree (account_id)
    "index_websites_on_deleted_at" btree (deleted_at)
    "index_websites_on_domain_id" btree (domain_id)
    "index_websites_on_parent_id" btree (parent_id)
    "index_websites_on_processed_at" btree (processed_at)

职位指数

Indexes:
    "positions_pkey" PRIMARY KEY, btree (id)
    "index_positions_on_5_columns" UNIQUE, btree (website_id, keyword_id, created_at, engine_id, region_id)
    "overlap_index" btree (keyword_id, created_at)
劳伦兹·阿尔贝(Laurenz Albe)

第二个EXPLAIN输出显示的行数减少了200倍以上,因此排序速度要快得多也就不足为奇了。

您会注意到,在两种情况下(Sort Method: external merge Disk: ...kB排序都会溢出到磁盘上如果你能保持在通过提高内存的排序work_mem,这将是快。
但是第一种类型是如此之大,以至于您将无法将其放入内存中。

加快查询速度的想法:

  • 对索引(keyword_id, created_at)进行positions不确定是否有帮助。

  • 首先进行过滤,如下所示:

    SELECT
       a.id        AS account_id,
       w.parent_id AS parent_id,
       w.name      AS name,
       p.position  AS position
    FROM (SELECT DISTINCT ON (keyword_id)
             positions,
             website_id,
             keyword_id,
             created_at
          FROM positions
          ORDER BY keyword_id, created_at DESC) p
       JOIN ...
    WHERE ...
    ORDER BY p.keyword_id, p.created_at DESC;
    

备注:DISTINCT ON是有些奇怪,因为你不知道ORDER BY的值SELECT列表,这样的结果值没有明确定义。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章