使用自定义格式将 CSV 转换为 JSON

Astro 发表于 Dev

天文

我正在尝试使用 Pandas 从 CSV 创建一个 JSON 文件

CSV 文件这只是一个摘录，对于长表很抱歉，但我想更清楚地显示内容。

月	类型	亚型	项目名称
十二月	对象类型A	子类型 A1	第 1 项
十二月	对象类型A	子类型 A1	第 2 项
十二月	对象类型A	子类型 A2	第 3 项
十二月	对象类型A	子类型 A2	第 4 项
十二月	对象类型A	子类型 A2	第 5 项
十二月	对象类型A	子类型 A3	第 6 项
十二月	对象类型A	子类型 A3	第 7 项
十二月	对象类型A	子类型 A4	第 8 项
十二月	对象类型A	子类型 A4	第 9 项
十二月	对象类型A	子类型 A4	第 10 项
十二月	对象类型A	子类型 A4	第 11 项
十二月	对象类型A	子类型 A4	第 12 项
十二月	对象类型A	子类型 A5	第 13 项
十二月	对象类型A	子类型 A5	第 14 项
十二月	对象类型A	子类型 A5	第 15 项
十二月	对象类型B	子类型 B1	第 16 项
十二月	对象类型B	子类型 B1	第 17 项
十二月	对象类型B	子类型 B2	第 18 项
十二月	对象类型B	子类型 B2	第 19 项
十二月	对象类型B	子类型 B2	第 20 项
十二月	对象类型B	子类型 B3	第 21 项
十二月	对象类型B	子类型 B3	第 22 项
行进	对象类型A	子类型 A1	第 23 项
行进	对象类型A	子类型 A1	第 24 项
行进	对象类型A	子类型 A2	第 25 项
行进	对象类型A	子类型 A2	第 26 项
行进	对象类型A	子类型 A2	第 27 项
行进	对象类型A	子类型 A3	第 28 项
行进	对象类型A	子类型 A3	第 29 项
行进	对象类型A	子类型 A4	第 30 项
行进	对象类型A	子类型 A4	第 31 项
行进	对象类型A	子类型 A4	第 32 项
行进	对象类型A	子类型 A4	第 33 项
行进	对象类型A	子类型 A4	第 34 项
行进	对象类型C	子类型 C1	第 35 项
行进	对象类型C	子类型 C1	第 36 项
行进	对象类型C	子类型 C2	第 37 项
行进	对象类型C	子类型 C2	第 38 项
行进	对象类型C	子类型 C3	第 39 项

所需输出

allobjects: {
"December": {
    "Object Type A": {
        "Subtype A1": ["Item1","Item2"],
        "Subtype A2": ["Item3","Item4","Item5"],
        "Subtype A3": ["Item6","Item7"],
        "Subtype A4": ["Item8","Item9"],
        "Subtype A5": ["Item10","Item11","Item12"]
        },
                
    "Object Type B": {
        "Subtype B1": ["Item13","Item14"],
        "Subtype B2": ["Item16","Item15","Item17","Item18"],
        "Subtype B3": ["Item19","Item20"],
        "Subtype B4": ["Item21","Item22"],
        "Subtype B5": ["Item23","Item24","Item25"]
        },
    "Object Type C": {
        "Subtype C1": ["Item26", "Item27"],
        "Subtype C2": ["Item28", "Item29"],
        "Subtype C3": ["Item30", "Item31"]
        }},
"March": {
    "Object Type A": {
        "Subtype A1": ["Item32","Item33"],
        "Subtype A2": ["Item34","Item35"],
        "Subtype A3": ["Item36","Item37"],
        "Subtype A4": ["Item38","Item39","Item40"],
        "Subtype A5": ["Item41","Item42","Item44"]
        },
                
    "Object Type C": {
        "Subtype C1": ["Item45", "Item46"],
        "Subtype C2": ["Item47", "Item48"],
        "Subtype C3": ["Item49", "Ite50"]
        }},
    },

当前代码

df = pd.read_csv("Book4.csv", dtype={
            "Month" : str,
            "Type" : str,
            "Subtype" : str,
            "ItemName": str,
        })


compiled = []

for (month, type, subtype), bag in df.groupby(["Month", "Type", "Subtype"]):
    contents = bag.drop(["Month", "Type","Subtype"], axis=1)
    allitems = [list(row) for i,row in contents.items()]
    compiled.append(dict([(month, {}),
                        (type, {}),
                        (subtype, allitems),
                         ]))
with open("Book4_pandas.json", 'w') as outfile:
    outfile.write(json.dumps(compiled, sort_keys=False, indent=2, separators=(',', ': ') ))

当前代码的输出

[
  {
    "December": {},
    "ObjectTypeA": {},
    "Subtype A1": [
       [ "Item1",
             "Item2"
           ]
    ]
  },
  {
    "December": {},
    "ObjectTypeA": {},
    "Subtype A2": [
       [ "Item3",
             "Item4",
         "Item5"
           ]
    ]
  },

.......This goes on for december and then

  {
    "March": {},
    "ObjectTypeA": {},
    "Subtype A1": [
       [ "Item23",
             "Item24"
           ]
    ]
  },
  {
    "March": {},
    "ObjectTypeA": {},
    "Subtype A2": [
       [ "Item25",
             "Item26",
         "Item27"
           ]
    ]
  }
]

I appreciate that the JSON format is non-standard; however, I figured that writing a dict would be one "easy" approach? I believe there is an error in the way the for loop is structured?

Many thanks in advance!

jezrael

You can first create Series filled by lists by aggregation and then in nested dict comprehension create expected ouput:

s = df.groupby(["Month", "Type", "SubType"], sort=False)['ItemName'].agg(list)

compiled = {i: {j[1]: h[j].to_dict() 
                for j, h in g.groupby(level=[0,1], sort=False)}
                for i, g in s.groupby(level=0, sort=False)}

print (compiled)

{
    'December': {
        'ObjectTypeA': {
            'SubType A1': ['Item 1', 'Item 2'],
            'SubType A2': ['Item 3', 'Item 4', 'Item 5'],
            'SubType A3': ['Item 6', 'Item 7'],
            'SubType A4': ['Item 8', 'Item 9', 'Item 10', 'Item 11', 'Item 12'],
            'SubType A5': ['Item 13', 'Item 14', 'Item 15']
        },
        'ObjectTypeB': {
            'SubType B1': ['Item 16', 'Item 17'],
            'SubType B2': ['Item 18', 'Item 19', 'Item 20'],
            'SubType B3': ['Item 21', 'Item 22']
        }
    },
    'March': {
        'ObjectTypeA': {
            'SubType A1': ['Item 23', 'Item 24'],
            'SubType A2': ['Item 25', 'Item 26', 'Item 27'],
            'SubType A3': ['Item 28', 'Item 29'],
            'SubType A4': ['Item 30', 'Item 31', 'Item 32', 'Item 33', 'Item 34']
        },
        'ObjectTypeC': {
            'SubType C1': ['Item 35', 'Item 36'],
            'SubType C2': ['Item 37', 'Item 38'],
            'SubType C3': ['Item 39']
        }
    }
}

with open("Book4_pandas.json", 'w') as outfile:
    outfile.write(json.dumps(compiled, sort_keys=False,
                             indent=2, separators=(',', ': ')))

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-09-9

我来说两句

0 条评论

登录后参与评论

上一篇：如何在 Matcher.replaceAll() 中进行条件正则表达式替换？

使用自定义格式将 CSV 转换为 JSON

使用自定义格式将 CSV 转换为 JSON

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Java Eclipse中的错误13，如何解决？

在Windows 7中无法删除文件（2）

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

套接字无法检测到断开连接

带有错误“ where”条件的查询如何返回结果？

有什么解决方案可以将android设备用作Cast Receiver？

Mac OS X更新后的GRUB 2问题

ggplot：对齐多个分面图-所有大小不同的分面

验证REST API参数

如何从视图一次更新多行（ASP.NET - Core）

尝试反复更改屏幕上按钮的位置 - kotlin android studio

计算数据帧中每行的NA

检索角度选择div的当前值

离子动态工具栏背景色

UITableView的项目向下滚动后更改颜色，然后快速备份

VB.net将2条特定行导出到DataGridView

蓝屏死机没有修复解决方案

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException