我的 JSON 看起来像:
{
"Key1":"Value1","Key2":"Value2","Key3":"Value3","List1":
[
{
"SubKey1":"SubValue1_1","SubKey2":"SubValue1_2","SubKey3":"SubValue1_3"
},
{
"SubKey1":"SubValue2_1","SubKey2":"SubValue2_2","SubKey3":"SubValue2_3"
},
{
"SubKey1":"SubValue3_1","SubKey2":"SubValue3_2","SubKey3":"SubValue3_3"
}
]
}
它加载到单个 BigQuery 表中,如下所示:
但我希望我的数据加载到 2 个单独的表中,例如:
和
请指导我应该做什么。
如果您可以使用 bq 命令行,那将是可能的。
假设您的 JSON 文件 (my_json_file.json) 位于 GCS 存储桶(例如 my_gcs_bucket)和目标表 my_dataset.my_destination_table 中,您可以运行以下命令
bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table "gs://my_gcs_bucket/my_json_file.json" ./schema.json
在 schema.json 中,您已经选择了目标表的架构。例如,以下两个模式将按预期加载数据:
schema_1.json
[
{
"mode": "NULLABLE",
"name": "Key1",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Key2",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Key3",
"type": "STRING"
}
]
和schema_2.json
[
{
"mode": "NULLABLE",
"name": "Key1",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "SubKey1",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "SubKey2",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "SubKey3",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "List1",
"type": "RECORD"
}
]
接着
bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table_1 "gs://my_gcs_bucket/my_json_file.json" ./schema_1.json
bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table_2 "gs://my_gcs_bucket/my_json_file.json" ./schema_2.json
将基于同一个 JSON 文件加载两个不同的表
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句