我正在将.json
数据摄取到 Google BigQuery 中,并且在摄取时, 的数据类型arrays
和objects
来自 的数据类型.json
都被转换为string
列。BigQuery 中的数据如下所示:
select 1 as id, '[]' as stringCol1, '[]' as stringCol2 union all
select 2 as id, null as stringCol1, null as stringCol2 union all
select 3 as id, "{'game': '22', 'year': 'sophomore'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32]" as stringCol2 union all
select 4 as id, "{'game': '17', 'year': 'freshman'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33]" as stringCol2 union all
select 5 as id, "{'game': '9', 'year': 'senior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17]" as stringCol2 union all
select 6 as id, "{'game': '234', 'year': 'junior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25]" as stringCol2
数据有点乱。
stringCol1
,缺失数据都有null
和'[]'
值。我想创建2列game
,并year
从该字符串化的对象。stringCol2
,这始终是一个包含 2 个对象的数组,具有相同的键(teamName
和teamAge
,在这种情况下)。然后需要将其转换为 4 列teamName1
, teamAge1
, teamName2
, teamAge2
。这篇类似的帖子解决了将基本字符串化数组转换为非字符串化数组的问题,但这里的这个例子有点复杂。特别是,其他帖子中的解决方案在这种情况下不起作用。
下面是 BigQuery 标准 SQL
#standardSQL
SELECT id,
JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
JSON_EXTRACT_SCALAR(t1, '$.teamName') AS teamName1,
JSON_EXTRACT_SCALAR(t1, '$.teamAge') AS teamAge1,
JSON_EXTRACT_SCALAR(t2, '$.teamName') AS teamName2,
JSON_EXTRACT_SCALAR(t2, '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(
JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(0)] AS t1,
JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(1)] AS t2
)])
如果适用于您问题中的样本数据
WITH `project.dataset.table` AS (
SELECT 1 AS id, '[]' AS stringCol1, '[]' AS stringCol2 UNION ALL
SELECT 2 AS id, NULL AS stringCol1, NULL AS stringCol2 UNION ALL
SELECT 3 AS id, "{'game': '22', 'year': 'sophomore'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32}]" AS stringCol2 UNION ALL
SELECT 4 AS id, "{'game': '17', 'year': 'freshman'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33}]" AS stringCol2 UNION ALL
SELECT 5 AS id, "{'game': '9', 'year': 'senior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17}]" AS stringCol2 UNION ALL
SELECT 6 AS id, "{'game': '234', 'year': 'junior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25}]" AS stringCol2
)
输出是
Row id game year teamName1 teamAge1 teamName2 teamAge2
1 1 null null null null null null
2 2 null null null null null null
3 3 22 sophomore teamA 37 teamB 32
4 4 17 freshman teamA 32 teamB 33
5 5 9 senior teamC 31 teamD 17
6 6 234 junior teamC 42 teamD 25
例如,上面可以有很多变体来提高可读性
#standardSQL
SELECT id,
JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamName') AS teamName1,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamAge') AS teamAge1,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamName') AS teamName2,
JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(JSON_EXTRACT_ARRAY(stringCol2) AS t)])
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句