我正在尝试了解如何使用Python从XML文件中提取某些数据。
目前,我正在从API中获取信息并获取XML文件,但我想直接从XML中获取特定信息。
从我发现的内容看来,似乎是元素树就是答案,但我发现很难理解,而且真的不确定这是创建解决方案的正确方法。
我将下面的代码用于获取XML数据,并将其缩短后的XML文件留给了我(只留了我需要提取的重要部分)。
谢谢。
import requests
#Import routes
routes=[]
class routesClass:
def __init__(self,name,url):#,start,end,offset,rwe,al):
self.n=name
self.u=url
#self.s=start
#self.e=end
#self.o=offset
#self.r=rwe
#self.a=al
#Add example route
testRoute1=routesClass("EasternFwy-Hoddle/Johnston","https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.7999012967757,144.99318476311566:?routeType=shortest&key=SECRETKEY&computeTravelTimeFor=all")
routes.append(testRoute1)
#routes.append(testRoute2)
print(routes[0].u)
还有XML的东西。
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<leg>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
我推荐lxml。在我看来,在XML树中导航比在元素树中导航容易。。这是有关如何使用该模块的演示。
示例
以您的xml为例,这就是我使用lxml解析的方式。如果将代码保存为example.xml和xmlparse.py
example.xml-您提供的XML格式错误。
<leg>
在两个摘要部分的中间有一个随机标签。这两个问题不允许它解析,因此我删除了<leg>
标签并将标签中的两个摘要部分分组<parent>
。这是XML。
<parent>
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
</parent>
xmlparse.py-在此脚本中,我为您提供一个循环,该循环打印出键(elem.text)和值(文本),以及一个逻辑语句,用于检查键之一是否存在以及其值是否大于700, 。这只是为了帮助您了解如何在循环中添加触发器。
from lxml import etree
def parseXML(xmlFile):
"""
Parse the xml
"""
with open(xmlFile) as fobj:
xml = fobj.read()
root = etree.fromstring(xml)
for appt in root.getchildren():
for elem in appt.getchildren():
if not elem.text:
text = "None"
else:
text = elem.text
##This is doing something with the xml based on it's tag and value.
if elem.tag == 'travelTimeInSeconds' and int(text) > 700:
print('******** Do something with ', elem.tag, ' : ', text)
print(elem.tag + " => " + text)
if __name__ == "__main__":
parseXML("example.xml")
输出-如果您保存xmlparse.py的代码并将我提供的更新的xml保存在example.xml文件中,则在运行脚本时将收到以下输出:
lengthInMeters => 5144
******** Do something with travelTimeInSeconds : 764
travelTimeInSeconds => 764
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:54:58+11:00
noTrafficTravelTimeInSeconds => 478
historicTrafficTravelTimeInSeconds => 764
liveTrafficIncidentsTravelTimeInSeconds => 764
lengthInMeters => 806
travelTimeInSeconds => 67
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:43:21+11:00
noTrafficTravelTimeInSeconds => 59
historicTrafficTravelTimeInSeconds => 67
liveTrafficIncidentsTravelTimeInSeconds => 67
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句