熊猫展平嵌套的 jsons

贾斯汀

所以这可能是一个重复的问题,但我会试一试,因为我没有找到任何东西。

我正在尝试用熊猫展平 json,正常工作。查看此处的文档示例是我正在尝试做的最接近的示例:

data = [{'state': 'Florida',
         'shortname': 'FL',
         'info': {'governor': 'Rick Scott'},
         'counties': [{'name': 'Dade', 'population': 12345},
                      {'name': 'Broward', 'population': 40000},
                      {'name': 'Palm Beach', 'population': 60000}]},
        {'state': 'Ohio',
         'shortname': 'OH',
         'info': {'governor': 'John Kasich'},
         'counties': [{'name': 'Summit', 'population': 1234},
                      {'name': 'Cuyahoga', 'population': 1337}]}]
result = pd.json_normalize(data, 'counties', ['state', 'shortname',
                                           ['info', 'governor']])
result
         name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich

但是,此示例向我们展示了一种将内部数据counties与列状态和短名称一起展平的方法。假设我nroot每个 json 对象中都有列数(上面示例中的列nstateshortname列数)。我如何将它们全部包括在内,以便使县变平但保留相邻的所有其他东西?

首先我尝试了这样的事情:

#None to treat data as a list of records
#Result of counties is still nested, not working
result = pd.json_normalize(data, None, ['counties'])

要么


result = pd.json_normalize(data, None, ['counties', 'name'])

然后我想到了获取列dataframe.columns并重用它,因为 meta 参数json_normalize可以采用字符串数组。

但我被困住了。并且columns似乎返回了我不想返回的嵌套 json 属性。

#still nested
cols = pd.json_normalize(data).columns.to_list()
#Exclude it because we already have  it 
cols = [index for index in cols if index != 'counties']
#remove nested columns if any
cols = [index for index in cols if "." not in index]

result = pd.json_normalize(data, 'counties', cols, errors="ignore")
#still nested

         name  population    state shortname  ...     other6     other7                                           counties info.governor
0        Dade       12345  Florida        FL  ...  dumb_data  dumb_data  [{'name': 'Dade', 'population': 12345}, {'name...           NaN
1     Broward       40000  Florida        FL  ...  dumb_data  dumb_data  [{'name': 'Dade', 'population': 12345}, {'name...           NaN
2  Palm Beach       60000  Florida        FL  ...  dumb_data  dumb_data  [{'name': 'Dade', 'population': 12345}, {'name...           NaN
3      Summit        1234     Ohio        OH  ...  dumb_data  dumb_data  [{'name': 'Summit', 'population': 1234}, {'nam...           NaN
4    Cuyahoga        1337     Ohio        OH  ...  dumb_data  dumb_data  [{'name': 'Summit', 'population': 1234}, {'nam...           NaN

我不希望只对列名进行硬编码,因为它们会发生变化,而且在这种情况下,我有 64 个......

为了更好地理解,这是我正在从Woo Rest API处理的真实数据。我没有在这里使用它,因为它真的很长,但基本上我试图在line_items其中仅保留 product_id ,当然还有与line_items.

贾斯汀

好吧,伙计们,如果您想展平 json 并保留其他所有内容,您应该使用pd.Dataframe.explode()

这是我的逻辑:

import pandas as pd

data = [
        {'state': 'Florida',
        'shortname': 'FL',
        'info': {'governor': 'Rick Scott'},
        'counties': [
                      {'name': 'Dade', 'population': 12345},
                      {'name': 'Broward', 'population': 40000},
                      {'name': 'Palm Beach', 'population': 60000}
        ]
        },
        {'state': 'Ohio',
        'shortname': 'OH',
        'info': {'governor': 'John Kasich'},
        'counties': [{'name': 'Summit', 'population': 1234},
        {'name': 'Cuyahoga', 'population': 1337}]}
]
#No Formating only converting to a Df
result = pd.json_normalize(data)

#Exploding the wanted nested column
exploded = result.explode('counties')

#Keeping the name only - this can be custom
exploded['countie_name'] = exploded['counties'].apply(lambda x: x['name'])

#Drop the used column since we took what interested us inside it.
exploded = exploded.drop(['counties'], axis=1)

print(exploded)
#Duplicate for Florida, as wanted with diferent countie names
     state shortname info.governor countie_name
0  Florida        FL    Rick Scott         Dade
0  Florida        FL    Rick Scott      Broward
0  Florida        FL    Rick Scott   Palm Beach
1     Ohio        OH   John Kasich       Summit
1     Ohio        OH   John Kasich     Cuyahoga

想象一下,你有一个篮子产品的内容作为一个嵌套的 json,对explode篮子的内容同时保持一般的篮子属性,你可以这样做。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章