在 BigQuery 中,将字符串化的对象数组转换为非字符串化

卡诺维斯

我正在将.json数据摄取到 Google BigQuery 中,并且在摄取时, 的数据类型arraysobjects来自 的数据类型.json都被转换为string列。BigQuery 中的数据如下所示:

select 1 as id, '[]' as stringCol1, '[]' as stringCol2 union all
select 2 as id, null as stringCol1, null as stringCol2 union all
select 3 as id, "{'game': '22', 'year': 'sophomore'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32]" as stringCol2 union all
select 4 as id, "{'game': '17', 'year': 'freshman'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33]" as stringCol2 union all
select 5 as id, "{'game': '9', 'year': 'senior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17]" as stringCol2 union all
select 6 as id, "{'game': '234', 'year': 'junior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25]" as stringCol2

在此处输入图片说明

数据有点乱。

  • 在 中stringCol1缺失数据都有null'[]'值。我想创建2列game,并year从该字符串化的对象。
  • 对于stringCol2,这始终是一个包含 2 个对象的数组,具有相同的键(teamNameteamAge,在这种情况下)。然后需要将其转换为 4 列teamName1, teamAge1, teamName2, teamAge2

这篇类似的帖子解决了将基本字符串化数组转换为非字符串化数组的问题,但这里的这个例子有点复杂。特别是,其他帖子中的解决方案在这种情况下不起作用。

米哈伊尔·贝里安特

下面是 BigQuery 标准 SQL

#standardSQL
SELECT id,
  JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
  JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
  JSON_EXTRACT_SCALAR(t1, '$.teamName') AS teamName1,
  JSON_EXTRACT_SCALAR(t1, '$.teamAge') AS teamAge1,
  JSON_EXTRACT_SCALAR(t2, '$.teamName') AS teamName2,
  JSON_EXTRACT_SCALAR(t2, '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(
  JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(0)] AS t1, 
  JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(1)] AS t2
)])   

如果适用于您问题中的样本数据

WITH `project.dataset.table` AS (
  SELECT 1 AS id, '[]' AS stringCol1, '[]' AS stringCol2 UNION ALL
  SELECT 2 AS id, NULL AS stringCol1, NULL AS stringCol2 UNION ALL
  SELECT 3 AS id, "{'game': '22', 'year': 'sophomore'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32}]" AS stringCol2 UNION ALL
  SELECT 4 AS id, "{'game': '17', 'year': 'freshman'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33}]" AS stringCol2 UNION ALL
  SELECT 5 AS id, "{'game': '9', 'year': 'senior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17}]" AS stringCol2 UNION ALL
  SELECT 6 AS id, "{'game': '234', 'year': 'junior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25}]" AS stringCol2
) 

输出是

Row id  game    year        teamName1   teamAge1    teamName2   teamAge2     
1   1   null    null        null        null        null        null     
2   2   null    null        null        null        null        null     
3   3   22      sophomore   teamA       37          teamB       32   
4   4   17      freshman    teamA       32          teamB       33   
5   5   9       senior      teamC       31          teamD       17   
6   6   234     junior      teamC       42          teamD       25      

例如,上面可以有很多变体来提高可读性

#standardSQL
SELECT id,
  JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
  JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamName') AS teamName1,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamAge') AS teamAge1,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamName') AS teamName2,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(JSON_EXTRACT_ARRAY(stringCol2) AS t)])

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章