I'm running out of memory with my query on Google BigQuery. I have to calculate multiple window functions like running sums over multiple different time frames. My data mainly consists of an id (string), a value (number), a type ('in' or 'out', could be converted to bool is needed) and a timestamp.
I read that there is no way to increase memory per slot, so the only way to be able to execute the query is to cut it into smaller pieces that can be sent to different slots. A way to do this is to use GROUP BY
or OVER (PARTITION BY ...)
but I have no idea how I could rewrite my query to make use of it.
I have some calculations that need to use PARTITION BY
but for others, I want to calculate the total overall, for example:
Imagine a have a large table (> 1 billion rows) where I want to calculate a rolling sum over all values for different time frames, independent of id.
WITH data AS (
SELECT *
FROM UNNEST([
STRUCT
('A' as id,1 as value, 'out' as type, 1 as time),
('A', -1, 'in', 2),
('B', 2, 'out', 2),
('C', 1, 'out', 3),
('B', -1, 'in', 4),
('A', 2, 'out', 4),
('C', 5, 'out', 5),
('B', 3, 'out', 6),
('A', 1, 'out', 6),
('A', -4, 'in', 6),
('C', -3, 'in', 7)
])
)
SELECT
id
, value
, type
, time
, SUM(value) OVER (ORDER BY time RANGE UNBOUNDED PRECEDING) as total
, SUM(value) OVER (ORDER BY time RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) as total_last_day
, SUM(value) OVER (ORDER BY time RANGE BETWEEN 3 PRECEDING AND 2 PRECEDING) as total_prev_day
FROM data
How could I split this query to make use of PARTITION BY
or GROUP BY
in order to fit within the memory limits?
Try below approach - I think it has good chances to resolve your issue
SELECT *
FROM data
JOIN (
SELECT time
, SUM(time_value) OVER (ORDER BY time RANGE UNBOUNDED PRECEDING) as total
, SUM(time_value) OVER (ORDER BY time RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) as total_last_day
, SUM(time_value) OVER (ORDER BY time RANGE BETWEEN 3 PRECEDING AND 2 PRECEDING) as total_prev_day
FROM (
SELECT time, SUM(value) time_value
FROM data
GROUP BY time
)
)
USING (time)
if applied to sample data in your question - output is
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments