我正在尝试使用xpath查询和lxml模块获取不同的值。所以我的代码似乎工作正常,但是我有两个无法解决的问题。Xml
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmconvert 0.8.5">
<node id="429459476" lat="55.6091243" lon="37.7270414" version="2" timestamp="2012-02-20T18:13:50Z" changeset="10743203" uid="210173" user="osmmaker">
<tag k="amenity" v="library"/>
<tag k="name" v="Детская библиотека №101"/>
<tag k="opening_hours" v="Mo-Fr 12:00-18:00; Sa 12:00-17:00"/>
<tag k="phone" v="+7-495-3995297"/>
</node>
<node id="448176571" lat="55.6098905" lon="37.7317767" version="2" timestamp="2009-11-03T16:02:27Z" changeset="3025778" uid="75496" user="navigARTor">
<tag k="highway" v="bus_stop"/>
<tag k="name" v="Воронежская улица"/>
</node>
<node id="448176571" lat="55.6098905" lon="37.7317767" version="2" timestamp="2009-11-03T16:02:27Z" changeset="3025778" uid="75496" user="navigARTor">
<tag k="highway" v="bus_stop"/>
<tag k="name" v="Воронежская улица"/>
</node>
</osm>
Python代码
from lxml import etree
tree = etree.parse('out.xml')
tags = tree.xpath('./node[tag[not(@k = preceding::tag/@k)]]')
with open('10.xml','w') as f:
for tag in tags:
f.write(etree.tostring(tag,pretty_print=True).decode())
XPath查询后的XML
<node id="429459476" lat="55.6091243" lon="37.7270414" version="2" timestamp="2012-02-20T18:13:50Z" changeset="10743203" uid="210173" user="osmmaker">
<tag k="amenity" v="library"/>
<tag k="name" v="Детская библиотека №101"/>
<tag k="opening_hours" v="Mo-Fr 12:00-18:00; Sa 12:00-17:00"/>
<tag k="phone" v="+7-495-3995297"/>
</node>
<node id="448176571" lat="55.6098905" lon="37.7317767" version="2" timestamp="2009-11-03T16:02:27Z" changeset="3025778" uid="75496" user="navigARTor">
<tag k="highway" v="bus_stop"/>
<tag k="name" v="Воронежская улица"/>
</node>
问题1
如何获得完整的xml文档,例如:
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmconvert 0.8.5">
<node id="429459476" lat="55.6091243" lon="37.7270414" version="2" timestamp="2012-02-20T18:13:50Z" changeset="10743203" uid="210173" user="osmmaker">
<tag k="amenity" v="library"/>
<tag k="name" v="Детская библиотека №101"/>
<tag k="opening_hours" v="Mo-Fr 12:00-18:00; Sa 12:00-17:00"/>
<tag k="phone" v="+7-495-3995297"/>
</node>
<node id="448176571" lat="55.6098905" lon="37.7317767" version="2" timestamp="2009-11-03T16:02:27Z" changeset="3025778" uid="75496" user="navigARTor">
<tag k="highway" v="bus_stop"/>
<tag k="name" v="Воронежская улица"/>
</node>
</osm>
问题2以及如何摆脱这种abracadabra
v="Воронежская улица
PS对不起,我的英语不好,我希望你能理解
考虑使用XPath的同级XSLT来操纵您的源XML。XPath非常适合解析文档的选定区域,而XSLT是一种专用于转换文档的语言。特别是,您需要的是Muenchian方法,您可以在其中按元素/属性值(使用xsl:key
)对文档进行索引,以进行分组以返回不同的值。在这里我们使用node/@id
。
Python的lxml
模块可以处理XSLT 1.0脚本。由于此类脚本是格式正确的XML文件,因此可以从文件中解析它们,也可以将其解析为嵌入式字符串。采取这种方法的另一个原因是保留unicode,这是原始输出的挑战,因为该etree.tostring()
方法是为特殊的西里尔字母渲染字符实体。
XSLT脚本(另存为.xsl)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:key name="nodekey" match="node" use="@id" />
<xsl:template match="/osm">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:for-each select="node[count(. | key('nodekey', @id))]">
<xsl:copy>
<xsl:copy-of select="@*|*"/>
</xsl:copy>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:transform>
Python脚本
from lxml import etree
xml = etree.parse('Input.xml')
xsl = etree.parse('XSLTScript.xsl')
transform = etree.XSLT(xsl)
newdom = transform(xml)
with open('Output.xml', 'wb') as f:
f.write(newdom)
XML输出
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="osmconvert 0.8.5">
<node id="429459476" lat="55.6091243" lon="37.7270414" version="2" timestamp="2012-02-20T18:13:50Z" changeset="10743203" uid="210173" user="osmmaker">
<tag k="amenity" v="library"/>
<tag k="name" v="Детская библиотека №101"/>
<tag k="opening_hours" v="Mo-Fr 12:00-18:00; Sa 12:00-17:00"/>
<tag k="phone" v="+7-495-3995297"/>
</node>
<node id="448176571" lat="55.6098905" lon="37.7317767" version="2" timestamp="2009-11-03T16:02:27Z" changeset="3025778" uid="75496" user="navigARTor">
<tag k="highway" v="bus_stop"/>
<tag k="name" v="Воронежская улица"/>
</node>
</osm>
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句