python pandas将数据框转换为所需字典的数组

呼吸

[编辑]

我有以下方式的数据框

ID      , EmailID    , First Name, Last Name, Gender, DOB
1       , [email protected]  , One First , One Last , M     , 11-13-1920
2       , [email protected]  , Two First , Two Last , M     , 11-13-1920
3       , [email protected]  , Thr First , Thr Last , M     , 11-13-1920
4       , [email protected]  , Fou First , Fou Last , M     , 11-13-1920
5       , [email protected]  , Fiv First , Fiv Last , M     , 11-13-1920
6       , [email protected]  , Six First , Six Last , M     , 11-13-1920

我想要下面想要的

[
   {"_id" : "[email protected]", "_souce" : {"ID": 1, "EmailID" : "[email protected]", "data" : "{'ID':'1', 'EmailID': '[email protected]', 'First Name' : 'One First', 'Last Name' : 'One First', 'Gender': 'M', 'DOB': '11-13-1920'}"}},
   {"_id" : "[email protected]", "_souce" : {"ID": 2, "EmailID" : "[email protected]", "data" : "{'ID':'2', 'EmailID': '[email protected]', 'First Name' : 'Two First', 'Last Name' : 'Two First', 'Gender': 'M', 'DOB': '11-13-1920'}"}},
   {"_id" : "[email protected]", "_souce" : {"ID": 3, "EmailID" : "[email protected]", "data" : "{'ID':'3', 'EmailID': '[email protected]', 'First Name' : 'The First', 'Last Name' : 'The First', 'Gender': 'M', 'DOB': '11-13-1920'}"}},
   {"_id" : "[email protected]", "_souce" : {"ID": 4, "EmailID" : "[email protected]", "data" : "{'ID':'4', 'EmailID': '[email protected]', 'First Name' : 'Fou First', 'Last Name' : 'Fou First', 'Gender': 'M', 'DOB': '11-13-1920'}"}},
   {"_id" : "[email protected]", "_souce" : {"ID": 5, "EmailID" : "[email protected]", "data" : "{'ID':'5', 'EmailID': '[email protected]', 'First Name' : 'Fiv First', 'Last Name' : 'Fiv First', 'Gender': 'M', 'DOB': '11-13-1920'}"}},
   {"_id" : "[email protected]", "_souce" : {"ID": 6, "EmailID" : "[email protected]", "data" : "{'ID':'6', 'EmailID': '[email protected]', 'First Name' : 'Six First', 'Last Name' : 'Six First', 'Gender': 'M', 'DOB': '11-13-1920'}"}}
]

我怎样才能有效地做到这一点?我应该循环并通过它或通过熊猫制作另一个数组吗

转换后的字典应具有

  1. _id与ID和EmailID的组合
  2. _source应该具有以下信息;
    1. 所有信息转换为json字符串的数据
    2. 在同一字典中具有ID,EmailID
耶斯列尔

将所有行转换为jsons转换为新列,然后添加_id列,按预期顺序将最后设置的列按字典顺序排序DataFrame.to_dict

df['data'] = df.apply(lambda x: x.to_json(), axis=1)
df['_souce'] = df[['ID','EmailID','data']].apply(lambda x: x.to_dict(), axis=1)
df['_id'] =  df['ID'].astype(str)+ '-' + df['EmailID'].astype(str)
d = df[['_id','_souce']].to_dict(orient='records')

print (d)

[{
    '_id': '[email protected]',
    '_souce': {
        'ID': 1,
        'EmailID': '[email protected]',
        'data': '{"ID":1,"EmailID":"[email protected]","First Name":"One First","Last Name":"One Last","Gender":"M","DOB":"11-13-1920"}'
    }
}, {
    '_id': '[email protected]',
    '_souce': {
        'ID': 2,
        'EmailID': '[email protected]',
        'data': '{"ID":2,"EmailID":"[email protected]","First Name":"Two First","Last Name":"Two Last","Gender":"M","DOB":"11-13-1920"}'
    }
}, {
    '_id': '[email protected]',
    '_souce': {
        'ID': 3,
        'EmailID': '[email protected]',
        'data': '{"ID":3,"EmailID":"[email protected]","First Name":"Thr First","Last Name":"Thr Last","Gender":"M","DOB":"11-13-1920"}'
    }
}, {
    '_id': '[email protected]',
    '_souce': {
        'ID': 4,
        'EmailID': '[email protected]',
        'data': '{"ID":4,"EmailID":"[email protected]","First Name":"Fou First","Last Name":"Fou Last","Gender":"M","DOB":"11-13-1920"}'
    }
}, {
    '_id': '[email protected]',
    '_souce': {
        'ID': 5,
        'EmailID': '[email protected]',
        'data': '{"ID":5,"EmailID":"[email protected]","First Name":"Fiv First","Last Name":"Fiv Last","Gender":"M","DOB":"11-13-1920"}'
    }
}, {
    '_id': '[email protected]',
    '_souce': {
        'ID': 6,
        'EmailID': '[email protected]',
        'data': '{"ID":6,"EmailID":"[email protected]","First Name":"Six First","Last Name":"Six Last","Gender":"M","DOB":"11-13-1920"}'
    }
}]

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章