使用嵌套循环提高SQL查询的性能-PostgreSQL

达伯留特

我正在使用PostgreSQL，而我的SQL查询有一个奇怪的问题。根据使用的日期参数。我的请求没有执行相同的操作。

这是我的工作查询：

SELECT DISTINCT app.id_application 
FROM stat sj
LEFT OUTER JOIN groupe gp ON gp.id_groupe = sj.id_groupe 
LEFT OUTER JOIN application app ON app.id_application = gp.id_application 
WHERE date_stat >= '2016/3/01' 
AND date_stat <= '2016/3/31' 
AND ( date_stat = date_gen-1 or (date_gen = '2016/04/01' AND date_stat = '2016/3/31')) 
AND app.id_application IS NOT NULL

该查询大约需要2秒（对我来说这是正常的，因为我有很多行）。当我为此查询运行EXPLAIN ANALYZE时，我有以下内容：

HashAggregate  (cost=375486.95..375493.62 rows=667 width=4) (actual time=2320.541..2320.656 rows=442 loops=1)
    ->  Hash Join  (cost=254.02..375478.99 rows=3186 width=4) (actual time=6.144..2271.984 rows=263274 loops=1)
    Hash Cond: (gp.id_application = app.id_application)
    ->  Hash Join  (cost=234.01..375415.17 rows=3186 width=4) (actual time=5.926..2200.671 rows=263274 loops=1)
          Hash Cond: (sj.id_groupe = gp.id_groupe)
          ->  Seq Scan on stat sj  (cost=0.00..375109.47 rows=3186 width=8) (actual time=3.196..2068.357 rows=263274 loops=1)
                Filter: ((date_stat >= '2016-03-01'::date) AND (date_stat <= '2016-03-31'::date) AND ((date_stat = (date_gen - 1)) OR ((date_gen = '2016-04-01'::date) AND (date_stat = '2016-03-31'::date))))
                Rows Removed by Filter: 7199514
          ->  Hash  (cost=133.45..133.45 rows=8045 width=12) (actual time=2.677..2.677 rows=8019 loops=1)
                Buckets: 1024  Batches: 1  Memory Usage: 345kB
                ->  Seq Scan on groupe gp  (cost=0.00..133.45 rows=8045 width=12) (actual time=0.007..1.284 rows=8019 loops=1)
    ->  Hash  (cost=11.67..11.67 rows=667 width=4) (actual time=0.206..0.206 rows=692 loops=1)
          Buckets: 1024  Batches: 1  Memory Usage: 25kB
          ->  Seq Scan on application app  (cost=0.00..11.67 rows=667 width=4) (actual time=0.007..0.101 rows=692 loops=1)
                Filter: (id_application IS NOT NULL)
    Total runtime: 2320.855 ms

现在，当我在当前月份尝试相同的查询（我们是4月6日，因此我试图获取所有April的application_id）时，使用相同的查询

SELECT DISTINCT app.id_application 
FROM stat sj
LEFT OUTER JOIN groupe gp ON gp.id_groupe = sj.id_groupe 
LEFT OUTER JOIN application app ON app.id_application = gp.id_application 
WHERE date_stat >= '2016/04/01' 
AND date_stat <= '2016/04/30' 
AND ( date_stat = date_gen-1 or ( date_gen = '2016/05/01' AND date_job = '2016/04/30')) 
AND app.id_application IS NOT NULL

此查询现在需要120秒。因此，我还在此查询上运行了EXPLAIN ANALYZE，现在它没有相同的操作：

HashAggregate  (cost=375363.50..375363.51 rows=1 width=4) (actual time=186716.468..186716.532 rows=490 loops=1)
->  Nested Loop  (cost=0.00..375363.49 rows=1 width=4) (actual time=1.945..186619.404 rows=118990 loops=1)
    Join Filter: (gp.id_application = app.id_application)
    Rows Removed by Join Filter: 82222090
    ->  Nested Loop  (cost=0.00..375343.49 rows=1 width=4) (actual time=1.821..171458.237 rows=118990 loops=1)
          Join Filter: (sj.id_groupe = gp.id_groupe)
          Rows Removed by Join Filter: 954061820
          ->  Seq Scan on stat sj  (cost=0.00..375109.47 rows=1 width=8) (actual time=0.235..1964.423 rows=118990 loops=1)
                Filter: ((date_stat >= '2016-04-01'::date) AND (date_stat <= '2016-04-30'::date) AND ((date_stat = (date_gen - 1)) OR ((date_gen = '2016-05-01'::date) AND (date_stat = '2016-04-30'::date))))
                Rows Removed by Filter: 7343798
          ->  Seq Scan on groupe gp  (cost=0.00..133.45 rows=8045 width=12) (actual time=0.002..0.736 rows=8019 loops=118990)
    ->  Seq Scan on application app  (cost=0.00..11.67 rows=667 width=4) (actual time=0.003..0.073 rows=692 loops=118990)
          Filter: (id_application IS NOT NULL)
  Total runtime: 186716.635 ms

因此，我决定通过减少查询中的条件数量来搜索问题出在哪里，直到性能再次可以接受为止。

所以只有这个参数

WHERE date_stat >= '2016/04/01'

它仅需要1.9秒（类似于第一个工作查询），并且还可以使用2个参数：

WHERE date_stat >= '2016/04/01' 
AND app.id_application IS NOT NULL

但是当我尝试添加这些行之一时，我在解释中有嵌套循环

AND date_stat <= '2016/04/30' 
AND ( date_stat = date_gen-1 or ( date_gen = '2016/05/01' AND date_stat = '2016/04/30'))

有人知道它可能来自哪里吗？

加百利的救世主

好的，看来优化器估算存在问题。他认为只有四月才会出现，1 row因此他选择NESTED LOOP对于大量的行（118,990在这种情况下）效率很低。

VACUUM ANALYZE为每个表执行。这将清除死元组并刷新统计信息。
考虑的基础上加入索引dates像CREATE INDEX date_stat_idx ON <table with date_stat> USING btree (date_stat);

重新运行查询，

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-2

我来说两句

0 条评论

登录后参与评论

上一篇：live555如何通过rtsp协议发送的h264比特流计算或读取帧速率

使用嵌套循环提高SQL查询的性能-PostgreSQL

使用嵌套循环提高SQL查询的性能-PostgreSQL

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用