Adding OR clause to where condition is much slower than two individual queries

sihaya

There are 2 tables, main_table and other_table. A user is allowed access to main_table items if his user_id matches the main_table.user_id column or if there is an entry in other_table where his user_id is in the read_acl column (array using gin index). Both tables have about 2 million rows.

The query is pretty slow especially, it seems Postgresql is doing an index scan on all entries in the main_table. If I remove the or clause and rewrite it to 2 queries then the speed is much better. Can this be solved somehow?

Current query is this:

select *
    from main_table
    where user_id = 123 or exists (select * from other_table f 
                                   where main_table.main_table_id = f.main_table_id 
                                   and '{123}' && read_acl)
    order by main_table_id limit 10;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..170.97 rows=10 width=2590) (actual time=172.734..389.819 rows=10 loops=1)
   ->  Index Scan using main_table_pk on main_table  (cost=0.43..14158795.44 rows=830198 width=2590) (actual time=172.733..389.811 rows=10 loops=1)
         Filter: ((user_id = 123) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
         Rows Removed by Filter: 776709
         SubPlan 1
           ->  Index Scan using other_table_main_table_id_idx on other_table f  (cost=0.43..8.45 rows=1 width=0) (never executed)
                 Index Cond: (main_table.main_table_id = main_table_id)
                 Filter: ('{123}'::text[] && read_acl)
         SubPlan 2
           ->  Bitmap Heap Scan on other_table f_1  (cost=1678.58..2992.04 rows=333 width=8) (actual time=9.413..9.432 rows=12 loops=1)
                 Recheck Cond: ('{123}'::text[] && read_acl)
                 Heap Blocks: exact=12
                 ->  Bitmap Index Scan on other_table_read_acl  (cost=0.00..1678.50 rows=333 width=0) (actual time=9.401..9.401 rows=12 loops=1)
                       Index Cond: ('{123}'::text[] && read_acl)
 Planning Time: 0.395 ms
 Execution Time: 389.877 ms
(16 rows)

Rewritten query 1 first part of OR clause:

    select *
    from main_table
    where user_id = 123
    order by main_table_id limit 10;

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=482.27..482.29 rows=10 width=2590) (actual time=0.039..0.040 rows=8 loops=1)
   ->  Sort  (cost=482.27..482.58 rows=126 width=2590) (actual time=0.038..0.039 rows=8 loops=1)
         Sort Key: main_table_id
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on main_table  (cost=5.40..479.54 rows=126 width=2590) (actual time=0.020..0.031 rows=8 loops=1)
               Recheck Cond: (user_id = 123)
               Heap Blocks: exact=8
               ->  Bitmap Index Scan on test500  (cost=0.00..5.37 rows=126 width=0) (actual time=0.015..0.015 rows=8 loops=1)
                     Index Cond: (user_id = 123)
 Planning Time: 0.130 ms
 Execution Time: 0.066 ms
(11 rows)

Rewritten query 2:

    select *
    from main_table
    where exists (select * from other_table f 
                  where main_table.main_table_id = f.main_table_id 
                  and '{123}' && read_acl)
    order by main_table_id limit 10;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5771.59..5771.62 rows=10 width=2590) (actual time=8.083..8.086 rows=10 loops=1)
   ->  Sort  (cost=5771.59..5772.42 rows=333 width=2590) (actual time=8.082..8.083 rows=10 loops=1)
         Sort Key: main_table.main_table_id
         Sort Method: quicksort  Memory: 26kB
         ->  Nested Loop  (cost=2985.30..5764.40 rows=333 width=2590) (actual time=8.018..8.072 rows=12 loops=1)
               ->  HashAggregate  (cost=2984.88..2988.21 rows=333 width=8) (actual time=7.999..8.004 rows=12 loops=1)
                     Group Key: f.main_table_id
                     ->  Bitmap Heap Scan on other_table f  (cost=1670.58..2984.04 rows=333 width=8) (actual time=7.969..7.990 rows=12 loops=1)
                           Recheck Cond: ('{123}'::text[] && read_acl)
                           Heap Blocks: exact=12
                           ->  Bitmap Index Scan on other_table_read_acl  (cost=0.00..1670.50 rows=333 width=0) (actual time=7.957..7.958 rows=12 loops=1)
                                 Index Cond: ('{123}'::text[] && read_acl)
               ->  Index Scan using main_table_pk on main_table  (cost=0.43..8.34 rows=1 width=2590) (actual time=0.005..0.005 rows=1 loops=12)
                     Index Cond: (main_table_id = f.main_table_id)
 Planning Time: 0.431 ms
 Execution Time: 8.137 ms
(16 rows)
Laurenz Albe

That is as expected. OR in a WHERE condition often is a problem for the optimizer. Yes, you can usually rewrite the query to a UNION or UNION ALL of the partial queries you show in your question, but this is not something that the optimizer can do automatically.

Consider this example:

CREATE TABLE a (
   id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
   x integer,
   p integer
);

CREATE TABLE b (
   id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
   x integer,
   q integer
);

INSERT INTO a (x, p) VALUES
   (1, 1),
   (1, 1),
   (2, 1);

INSERT INTO b (x, q) VALUES
   (1, 3),
   (2, 3);

Now the query with OR gives you:

SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1 OR b.q = 3;

 x │ p │ q 
═══╪═══╪═══
 1 │ 1 │ 3
 1 │ 1 │ 3
 2 │ 1 │ 3
(3 rows)

Rewriting the query with UNION or UNION ALL produces a different result:

SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1
UNION
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE b.q = 3;

 x │ p │ q 
═══╪═══╪═══
 2 │ 1 │ 3
 1 │ 1 │ 3
(2 rows)

SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1
UNION ALL
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE b.q = 3;

 x │ p │ q 
═══╪═══╪═══
 1 │ 1 │ 3
 1 │ 1 │ 3
 2 │ 1 │ 3
 1 │ 1 │ 3
 1 │ 1 │ 3
 2 │ 1 │ 3
(6 rows)

The case in your question uses SELECT *, which will include the primary key, so in this case it would be safe to use UNION. But that is reasoning that goes beyond the capabilities of the optimizer.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Why SELECT COUNT(*) is much slower than SELECT with a WHERE clause in MySQL?

Why is WHERE clause with OR so much slower than UNION ALL?

Adding a where clause makes the query considerably slower

Adding dynamic condition in where clause

Linq: Adding the Where clause "or" condition

Very strange -- Adding a compound index makes queries much slower (MongoDB)

why adding first item into collection much slower than second?

two condition for where Clause in SQL

Joining 2 MySQL temporary tables is 50x slower than joining a temporary table with a normal table and adding a WHERE clause?

Postgresql why is INNER JOIN so much slower than WHERE

Query produces no records for a field when adding more than one condition to JOIN or WHERE clause

adding a condition on WHERE clause of a multi lined query

MSSQL - Adding condition on the where and on clause performance

Two SQL Server queries with different where clause

Pulling two separate queries based on a WHERE clause

Please confirm: SYSDATETIME() is slower than GETDATE() in WHERE clause

Sql View with WHERE clause runs slower than a raw query

neo4j Creating many relationships with individual statements much slower than a loop

SQL query by adding two columns in where clause?

Atleast two different condition in where clause in Oracle

Two Condition Where-clause SQL

Streaming is much slower than Downloading

WHERE clause on individual rows

Is it better to use where instead of adding conditions into join clause in SQL queries?

Two different queries on the same table with the same WHERE clause

How to combine two update queries having different set and where clause?

Difficult where clause, may require the use of two queries?

two select queries with different where clause on same table

Combining two SELECT queries from same table but different WHERE clause