I have a Google BigQuery table that looks like this:
║ id ║ col_1 ║ col_2 ║ updated ║
║ 1 ║ first_data ║ null ║ 4/22 ║
║ 1 ║ null ║ old ║ 4/23 ║
║ 1 ║ null ║ correct ║ 4/24 ║
I would like to construct a query that combines these rows and "overwrites" null columns if there is a row with the same id with the column not null. Essentially the result should look like:
║ 1 ║ first_data ║ correct ║ 4/24 ║
If possible I would also like the result to represent a history:
║ 1 ║ first_data ║ old ║ 4/23 ║
║ 1 ║ first_data ║ correct ║ 4/24 ║
But that is secondary and not necessary.
Below is for BigQuery Standard SQL
#standardSQL
SELECT id,
IFNULL(col_1, FIRST_VALUE(col_1 IGNORE NULLS) OVER(win)) col_1,
IFNULL(col_2, FIRST_VALUE(col_2 IGNORE NULLS) OVER(win)) col_2,
updated
FROM `project.dataset.your_table`
WINDOW win AS (PARTITION BY id ORDER BY updated DESC
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
-- ORDER BY id, updated
You can test / play with it using dummy data as below
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 1 id, 'first_data' col_1, NULL col_2, '4/22' updated UNION ALL
SELECT 1, NULL, 'old', '4/23' UNION ALL
SELECT 1, NULL, 'correct', '4/24' UNION ALL
SELECT 1, 'next_data', NULL, '4/25' UNION ALL
SELECT 1, NULL, NULL, '4/26'
)
SELECT id,
IFNULL(col_1, FIRST_VALUE(col_1 IGNORE NULLS) OVER(win)) col_1,
IFNULL(col_2, FIRST_VALUE(col_2 IGNORE NULLS) OVER(win)) col_2,
updated
FROM `project.dataset.your_table`
WINDOW win AS (PARTITION BY id ORDER BY updated DESC
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
ORDER BY id, updated
with result
Row id col_1 col_2 updated
1 1 first_data null 4/22
2 1 first_data old 4/23
3 1 first_data correct 4/24
4 1 next_data correct 4/25
5 1 next_data correct 4/26
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments