XML导入到DataFrame Python

亚历山大·库什(AleksanderKuś)

我有看起来像这样的XML文件: 在此处输入图片说明

我正在尝试使用以下方法将其转换为Python中的DF:

import pandas as pd
import xml.etree.ElementTree as et

def parse_XML(xml_file, df_cols): 
"""Parse the input XML file and store the result in a pandas 
DataFrame with the given columns. 

The first element of df_cols is supposed to be the identifier 
variable, which is an attribute of each node element in the 
XML data; other features will be parsed from the text content 
of each sub-element. 
"""

xtree = et.parse(xml_file)
xroot = xtree.getroot()
rows = []

for node in xroot: 
    res = []
    res.append(node.attrib.get(df_cols[0]))
    for el in df_cols[1:]: 
        if node is not None and node.find(el) is not None:
            res.append(node.find(el).text)
        else: 
            res.append(None)
    rows.append({df_cols[i]: res[i] 
                 for i, _ in enumerate(df_cols)})

out_df = pd.DataFrame(rows, columns=df_cols)
    
return out_df

我正在使用这样的功能:

parse_XML(R'C:\Users\aleks\Desktop\zadania python+sql\import_xml.xml', ['Year', 'Row', 'New', 'Used', 'Total Sales New', 'Total Sales Used'])

但是结果而不是完整导入的XML文件是这样的: 在此处输入图片说明

如何使其工作,以便将完整的XML文件导入到DataFrame?谢谢。

完善

由于您的XML相对扁平,请考虑合并要传递到pandas.DataFrame构造函数中的元素和属性数据的字典具体来说,字典是通过标签的列表/字典理解以及row元素和attrib字典的文本解析来派生的

xtree = et.parse(xml_file)

# LIST OF DICTIONARIES
data = [{ **{r.tag:r.text for r in row.findall('*')}, **row.attrib
        } for row in xtree.findall('.//row/row')]

out_df = pd.DataFrame(data)

为了从密歇根州的OpenData进行展示,MVA车辆销售计数(类似于OP的图像数据):

import pandas as pd
import xml.etree.ElementTree as et
from urllib import request

url = "https://opendata.maryland.gov/api/views/un65-7ipd/rows.xml?accessType=DOWNLOAD"
rq = request.urlopen(url)
xtree = et.fromstring(rq.read())

# LIST OF DICTIONARIES
data = [{ **{r.tag:r.text for r in row.findall('*')}, **row.attrib
        } for row in xtree.findall('.//row/row')]

out_df = pd.DataFrame(data)

输出量

out_df

#     year month    new   used total_sales_new total_sales_used                 _id                                 _uuid _position                                           _address
#0    2002   JAN  31106  49927       755015820        386481929  row-a49t-ihm3.rpap  00000000-0000-0000-05D6-978893D301BF         0  https://opendata.maryland.gov/resource/un65-7i...
#1    2002   FEB  27520  50982       664454223        361353242  row-njtn_crc8_qawh  00000000-0000-0000-3211-E4A718B8A756         0  https://opendata.maryland.gov/resource/un65-7i...
#2    2002   MAR  34225  58794       805666244        419385387  row-tn8d-5xax-gcqz  00000000-0000-0000-9FC0-AB50DEE551EE         0  https://opendata.maryland.gov/resource/un65-7i...
#3    2002   APR  36452  59817       846368297        433061150  row-bjnm-jnr7.7rbw  00000000-0000-0000-FE23-E2E416AC13DB         0  https://opendata.maryland.gov/resource/un65-7i...
#4    2002   MAY  37359  60577       855005784        442569410  row-5wkh~fkp3.mzuj  00000000-0000-0000-CBC6-6E4FF3707E07         0  https://opendata.maryland.gov/resource/un65-7i...
#..    ...   ...    ...    ...             ...              ...                 ...                                   ...       ...                                                ...
#221  2020   JUN  23704  63395       821297823        699671992  row-m9ah.mpqr.7eww  00000000-0000-0000-3599-5C2EEABDA225         0  https://opendata.maryland.gov/resource/un65-7i...
#222  2020   JUL  22748  65652       784818757        608761285  row-79ht-rw2j~j4vy  00000000-0000-0000-4E88-6D5061A28AC5         0  https://opendata.maryland.gov/resource/un65-7i...
#223  2020   AUG  12640  48594       456451505        329492612  row-yh7w-48wu_vpnx  00000000-0000-0000-6ED5-CA4A564695EA         0  https://opendata.maryland.gov/resource/un65-7i...
#224  2020   SEP  11336  42148       395164402        274497385  row-e2ur-7zck~4rdd  00000000-0000-0000-C640-6EE6DAA0F07C         0  https://opendata.maryland.gov/resource/un65-7i...
#225  2020   OCT  13271  43827       477111877        328752668  row-yn9u~un9g_3heh  00000000-0000-0000-4EEE-250E14B81104         0  https://opendata.maryland.gov/resource/un65-7i...

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章