How to optimize this sql query on + 2 million rows

hbdev

I have a Sql database with +2 million rows and growing fast. There aren't many columns, only code, price, date and stationID.

The aim is to get the latest price by code and stationID. The query works well, but takes more than 10s.

Is there a way to optimize the query?

 $statement = $this->pdo->prepare(

        'WITH cte AS 
        (

        SELECT stationID AS ind, code, CAST(price AS DOUBLE ) AS price, date

        ,ROW_NUMBER() OVER(
             PARTITION BY code, stationID
             ORDER BY date DESC
         ) AS latest

        FROM price
        )
        
        SELECT *
        FROM cte
        WHERE latest  = 1
        '
    );

    $statement->execute();
    $results = $statement->fetchAll(PDO::FETCH_GROUP | PDO::FETCH_ASSOC);

Edit: There is an index on the first column called 'id'. I don't know if this helps.

The database (InnoDB) look like this:

id primary - int
stationID - int
code - int
price - decimal(10,5)
date - datetime

Edit 2:

The result need to be grouped by stationID and each stationID need to show many line. One line for every code with latest date. Like this:

22456:
      code:  1
      price: 3
      date:  2023-06-21

      code:  2
      price: 2
      date:  2023-06-21

      code:  3
      price: 5
      date:  2023-06-21

22457:
      code:  1
      price: 10
      date:  2023-06-21

      code:  2
      price: 1
      date:  2023-06-21

      code:  3
      price: 33
      date:  2023-06-21

The json output should be like this:

{"1000001":[{"code":1,"price":1.661,"date":"2023-06-06 12:46:32","latest":1},{"code":2,"price":1.867,"date":"2023-06-06 12:46:32","latest":1},{"code":3,"price":1.05,"date":"2023-06-06 12:46:32","latest":1},{"code":5,"price":1.818,"date":"2023-06-06 12:46:32","latest":1},{"code":6,"price":1.879,"date":"2023-06-06 12:46:32","latest":1}],"1000002":[{"code":1,"price":1.65,"date":"2023-06-03 08:53:26","latest":1},{"code":2,"price":1.868,"date":"2023-06-03 08:53:26","latest":1},{"code":6,"price":1.889,"date":"2023-06-03 08:53:27","latest":1}],… 
user1191247

As long as you cannot have two rows with the same datetime for the same code, stationID pair, using the window function is a bit like using a sledgehammer to crack a nut.

select p.stationID, p.code, p.price, p.date
from (
    select code, stationID, max(date) as max_date
    from price
    group by code, stationID
) max
join price p
    on max.code = p.code
   and max.stationID = p.stationID
   and max.max_date = p.date;

It requires the following index:

alter table price add index (code, stationID, date desc);

This query should take less than 1 ms, as the derived table can be built from the index, and then it is reading just the required rows from the table.

Alternatively, if you know that every code, stationID pair will have received an updated price within a specific time period (1 hr, 1 day, 1 week), then you could significantly reduce the work required by the window function by adding a where clause to it:

with cte as 
(
    select stationID as ind, code, price, date, row_number() over(partition by code, stationID order by date desc) as latest
    from price
    where date >= now() - interval 1 week
)
select * from cte where latest  = 1;

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related