NULL values behaviour in BigQuery Queries/Results

user6116741

Wanted to ask / discuss with you about null values behaviour in BigQuery.

I have noticed that filtering out real values in a NULLABLE column, will results filtering out both the value requested and NULL values.

Take this query for ex:

select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
--WHERE some_nullable_col != 3

All results return as expected,

And then:

select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
WHERE some_nullable_col != 3

will omit 2 columns. the value 3 and null.

I guess this happens because BigQuery won't index null values / won't scan null values on where clause for efficiency, but it also brings troubles:

Each time I filter on a nullable column, the filter will look like WHERE some_nullable_col != 3 OR some_nullable_col IS NULL

This is obviously less comfortable.

Just wanted to get an explanation / does BigQuery's roadmap offers a fix for this issue?

DoiT International

Yes, you are right that NULL is not matched to the comparator like some_nullable_col != 3. Let me explain the reason why.

Google are using a key value store as an underlying data storage for BigQuery. Unlike traditional relational database, data are fragmented by the row and fields and stored into many different locations. If the data is NULL, BigQuery consider the data does not exist, hence nothing is written to the data storage. As such, that filed will never be matched with any comparators except for "IS NULL". This is by design and Google don't have any plans to change the way it works at the moment.

The workaround for this is to set the special value for those fields. For example, if the type of that field is string, then you can use the null string "" instead of NULL. If the field type is non-negative integer, you can use "-1" as a special value. I understand that this is not really optimal, and it could be better to add "IS NULL" statement in your query in many situation. This is just to give you another option.

By the way, I tried the similar thing on my MySQL instance, and the way it behaves is the same as BigQuery. Namely, the query does not return NULL records with "=!" comparator.

For example,

mysql> select * from test1;
+------+------------+
| id   | num        |
+------+------------+
|    0 | aaa        |
|    1 | bbb        |
|    8 | sdfsdfgsdf |
|    9 | NULL       |
| NULL | sdfsdfsfsf |
+------+------------+
5 rows in set (0.19 sec)

and

mysql> select * from test1 where id != 8;
+------+------+
| id   | num  |
+------+------+
|    0 | aaa  |
|    1 | bbb  |
|    9 | NULL |
+------+------+
3 rows in set (0.18 sec)

So I think this is a standard behavior in SQL's world.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related