Combine Rows in BigQuery Ignoring Nulls

Ben Reid Published at Dev

Ben Reid

I have a Google BigQuery table that looks like this:

║ id ║   col_1    ║  col_2  ║ updated ║

║  1 ║ first_data ║ null    ║ 4/22    ║

║  1 ║ null       ║ old     ║ 4/23    ║

║  1 ║ null       ║ correct ║ 4/24    ║

I would like to construct a query that combines these rows and "overwrites" null columns if there is a row with the same id with the column not null. Essentially the result should look like:

║  1 ║ first_data ║ correct ║ 4/24    ║

If possible I would also like the result to represent a history:

║  1 ║ first_data ║ old     ║ 4/23    ║

║  1 ║ first_data ║ correct ║ 4/24    ║

But that is secondary and not necessary.

Mikhail Berlyant

Below is for BigQuery Standard SQL

#standardSQL
SELECT id, 
  IFNULL(col_1, FIRST_VALUE(col_1 IGNORE NULLS) OVER(win)) col_1, 
  IFNULL(col_2, FIRST_VALUE(col_2 IGNORE NULLS) OVER(win)) col_2, 
  updated
FROM `project.dataset.your_table`
WINDOW win AS (PARTITION BY id ORDER BY updated DESC 
               ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
-- ORDER BY id, updated

You can test / play with it using dummy data as below

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 1 id, 'first_data' col_1, NULL col_2,  '4/22' updated UNION ALL
  SELECT 1,     NULL,             'old',        '4/23'         UNION ALL
  SELECT 1,     NULL,             'correct',    '4/24'         UNION ALL
  SELECT 1,    'next_data',       NULL,         '4/25'         UNION ALL
  SELECT 1,     NULL,             NULL,         '4/26'         
)
SELECT id, 
  IFNULL(col_1, FIRST_VALUE(col_1 IGNORE NULLS) OVER(win)) col_1, 
  IFNULL(col_2, FIRST_VALUE(col_2 IGNORE NULLS) OVER(win)) col_2, 
  updated
FROM `project.dataset.your_table`
WINDOW win AS (PARTITION BY id ORDER BY updated DESC 
               ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
ORDER BY id, updated

with result

Row id  col_1       col_2   updated  
1   1   first_data  null    4/22     
2   1   first_data  old     4/23     
3   1   first_data  correct 4/24     
4   1   next_data   correct 4/25     
5   1   next_data   correct 4/26

Collected from the Internet

Please contact [email protected] to delete if infringement.