我正在尝试将具有多个扩展名的lage excel文件保存到带有熊猫的JSON文件中。我需要结果得到这样的结构:
{ 'Sheet1':
[ 'column1': value,
'column2': value,
'column3': value,
'column4': value ]
'Sheet2':
[ 'column1': value,
'column2': value,
'column3': value,
'column4': {'json_key1': value,
'json_key2': value,}
]
}
我尝试了以下代码以获取此信息:
import pandas as pd
import json
EXCEL_FILE = 'example_data.xlsm'
JSON_FILE = 'json_data.json'
sheets = pd.ExcelFile(EXCEL_FILE).sheet_names
json_data = {}
for sheet in sheets:
df = pd.read_excel(EXCEL_FILE, index_col=None, header=0, sheet_name=sheet, na_values='null')
json_data[sheet] = json.loads(df.to_json(orient='records', force_ascii=False, date_format='iso'))
with open(JSON_FILE, 'w', encoding='utf-8') as json_file:
json.dump(json_data, json_file, indent=2, ensure_ascii=False)
excel中有几列带有类似json的字符串。[1]:https://i.stack.imgur.com/gvc0K.png
当我使用df.to_json()导出到JSON时,它会像这样保存此列:
{
"acts_31L": [
{
"ID": 219100060,
"ID_ETD": null,
"INDEX_NUM": "31-7635-191022195410",
"IT_SECTIONS": "{\"CTIME\":\"2019-10-22 21:26:41.680\",\"section\":{\"CTIME\":\"2019-10-22 21:26:41.680\",\"SERIE\":\"506\",\"SERIE_NAME\":\"ТЭП70\",\"SER_NUM\":\"00000542\",\"SEC_CODE\":\"0\",\"EL_COUNT\":0,\"FUEL_LIT\":0.0,\"FUEL_DENS\":0.8,\"FUEL_KG\":0.0,\"IS_NEED\":\"1\"}}",
"IT_INVENT": "{\"CTIME\":\"2019-10-22 21:26:41.680\",\"inv\":{\"CTIME\":\"2019-10-22 21:26:41.680\",\"INVENT_NAME\":\"Пенька\",\"UNIT\":\"шт.\",\"NORMA\":0,\"FACT\":0,\"INFO_TYPE\":\"0\"}}"
},
那么如何将此字符串另存为json-object?
在将dataframe转换为json之前IT_SECTIONS
,IT_INVENT
请使用ast.literal_eval转换列和dict 。然后,您可以将其转换为json。
from ast import literal_eval
for sheet in sheets:
df = pd.read_excel(EXCEL_FILE, index_col=None, header=0, sheet_name=sheet, na_values='null')
df['IT_SECTIONS'] = df['IT_SECTIONS'].apply(lambda x: literal_eval(str(x)))
df['IT_INVENT'] = df['IT_INVENT'].apply(lambda x: literal_eval(str(x)))
json_data[sheet] = json.loads(df.to_json(orient='records', force_ascii=False, date_format='iso'))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句