将 JSON 数据加载到 BigQuery 中的多个表

asad 发表于 Dev

阿萨德

我的 JSON 看起来像：

{
    "Key1":"Value1","Key2":"Value2","Key3":"Value3","List1":
    [
        {
            "SubKey1":"SubValue1_1","SubKey2":"SubValue1_2","SubKey3":"SubValue1_3"
        },
        {
            "SubKey1":"SubValue2_1","SubKey2":"SubValue2_2","SubKey3":"SubValue2_3"
        },
        {
            "SubKey1":"SubValue3_1","SubKey2":"SubValue3_2","SubKey3":"SubValue3_3"
        }
    ]
}

它加载到单个 BigQuery 表中，如下所示：

但我希望我的数据加载到 2 个单独的表中，例如：

和

请指导我应该做什么。

乔治亚迪斯

如果您可以使用 bq 命令行，那将是可能的。

假设您的 JSON 文件 (my_json_file.json) 位于 GCS 存储桶（例如 my_gcs_bucket）和目标表 my_dataset.my_destination_table 中，您可以运行以下命令

bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table "gs://my_gcs_bucket/my_json_file.json" ./schema.json

在 schema.json 中，您已经选择了目标表的架构。例如，以下两个模式将按预期加载数据：

schema_1.json

[
  {
    "mode": "NULLABLE",
    "name": "Key1",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "Key2",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "Key3",
    "type": "STRING"
  }
]

和schema_2.json

[
  {
    "mode": "NULLABLE",
    "name": "Key1",
    "type": "STRING"
  },
  {
    "fields": [
      {
        "mode": "NULLABLE",
        "name": "SubKey1",
        "type": "STRING"
      },
      {
        "mode": "NULLABLE",
        "name": "SubKey2",
        "type": "STRING"
      },
      {
        "mode": "NULLABLE",
        "name": "SubKey3",
        "type": "STRING"
      }
    ],
    "mode": "REPEATED",
    "name": "List1",
    "type": "RECORD"
  }
]

接着

bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table_1 "gs://my_gcs_bucket/my_json_file.json" ./schema_1.json

bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON my_dataset.my_destination_table_2 "gs://my_gcs_bucket/my_json_file.json" ./schema_2.json

将基于同一个 JSON 文件加载两个不同的表

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。