我正在尝试使用以下方法将其转换为Python中的DF:
import pandas as pd
import xml.etree.ElementTree as et
def parse_XML(xml_file, df_cols):
"""Parse the input XML file and store the result in a pandas
DataFrame with the given columns.
The first element of df_cols is supposed to be the identifier
variable, which is an attribute of each node element in the
XML data; other features will be parsed from the text content
of each sub-element.
"""
xtree = et.parse(xml_file)
xroot = xtree.getroot()
rows = []
for node in xroot:
res = []
res.append(node.attrib.get(df_cols[0]))
for el in df_cols[1:]:
if node is not None and node.find(el) is not None:
res.append(node.find(el).text)
else:
res.append(None)
rows.append({df_cols[i]: res[i]
for i, _ in enumerate(df_cols)})
out_df = pd.DataFrame(rows, columns=df_cols)
return out_df
我正在使用这样的功能:
parse_XML(R'C:\Users\aleks\Desktop\zadania python+sql\import_xml.xml', ['Year', 'Row', 'New', 'Used', 'Total Sales New', 'Total Sales Used'])
如何使其工作,以便将完整的XML文件导入到DataFrame?谢谢。
由于您的XML相对扁平,请考虑合并要传递到pandas.DataFrame
构造函数中的元素和属性数据的字典。具体来说,字典是通过标签的列表/字典理解以及row
元素和attrib
字典的文本解析来派生的。
xtree = et.parse(xml_file)
# LIST OF DICTIONARIES
data = [{ **{r.tag:r.text for r in row.findall('*')}, **row.attrib
} for row in xtree.findall('.//row/row')]
out_df = pd.DataFrame(data)
为了从密歇根州的OpenData进行展示,MVA车辆销售计数(类似于OP的图像数据):
import pandas as pd
import xml.etree.ElementTree as et
from urllib import request
url = "https://opendata.maryland.gov/api/views/un65-7ipd/rows.xml?accessType=DOWNLOAD"
rq = request.urlopen(url)
xtree = et.fromstring(rq.read())
# LIST OF DICTIONARIES
data = [{ **{r.tag:r.text for r in row.findall('*')}, **row.attrib
} for row in xtree.findall('.//row/row')]
out_df = pd.DataFrame(data)
输出量
out_df
# year month new used total_sales_new total_sales_used _id _uuid _position _address
#0 2002 JAN 31106 49927 755015820 386481929 row-a49t-ihm3.rpap 00000000-0000-0000-05D6-978893D301BF 0 https://opendata.maryland.gov/resource/un65-7i...
#1 2002 FEB 27520 50982 664454223 361353242 row-njtn_crc8_qawh 00000000-0000-0000-3211-E4A718B8A756 0 https://opendata.maryland.gov/resource/un65-7i...
#2 2002 MAR 34225 58794 805666244 419385387 row-tn8d-5xax-gcqz 00000000-0000-0000-9FC0-AB50DEE551EE 0 https://opendata.maryland.gov/resource/un65-7i...
#3 2002 APR 36452 59817 846368297 433061150 row-bjnm-jnr7.7rbw 00000000-0000-0000-FE23-E2E416AC13DB 0 https://opendata.maryland.gov/resource/un65-7i...
#4 2002 MAY 37359 60577 855005784 442569410 row-5wkh~fkp3.mzuj 00000000-0000-0000-CBC6-6E4FF3707E07 0 https://opendata.maryland.gov/resource/un65-7i...
#.. ... ... ... ... ... ... ... ... ... ...
#221 2020 JUN 23704 63395 821297823 699671992 row-m9ah.mpqr.7eww 00000000-0000-0000-3599-5C2EEABDA225 0 https://opendata.maryland.gov/resource/un65-7i...
#222 2020 JUL 22748 65652 784818757 608761285 row-79ht-rw2j~j4vy 00000000-0000-0000-4E88-6D5061A28AC5 0 https://opendata.maryland.gov/resource/un65-7i...
#223 2020 AUG 12640 48594 456451505 329492612 row-yh7w-48wu_vpnx 00000000-0000-0000-6ED5-CA4A564695EA 0 https://opendata.maryland.gov/resource/un65-7i...
#224 2020 SEP 11336 42148 395164402 274497385 row-e2ur-7zck~4rdd 00000000-0000-0000-C640-6EE6DAA0F07C 0 https://opendata.maryland.gov/resource/un65-7i...
#225 2020 OCT 13271 43827 477111877 328752668 row-yn9u~un9g_3heh 00000000-0000-0000-4EEE-250E14B81104 0 https://opendata.maryland.gov/resource/un65-7i...
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句