使用解析器拆分嵌套的 XML 字符串以获取字符串

豪恩斯

我有这个字符串:

'<Section xml:space="preserve" HasTrailingParagraphBreakOnPaste="False" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"><Paragraph FontSize="11" FontFamily="Portable User Interface" Foreground="#FF000000" FontWeight="Normal" FontStyle="Normal" FontStretch="Normal" CharacterSpacing="0" Typography.AnnotationAlternates="0" Typography.EastAsianExpertForms="False" Typography.EastAsianLanguage="Normal" Typography.EastAsianWidths="Normal" Typography.StandardLigatures="True" Typography.ContextualLigatures="True" Typography.DiscretionaryLigatures="False" Typography.HistoricalLigatures="False" Typography.StandardSwashes="0" Typography.ContextualSwashes="0" Typography.ContextualAlternates="True" Typography.StylisticAlternates="0" Typography.StylisticSet1="False" Typography.StylisticSet2="False" Typography.StylisticSet3="False" Typography.StylisticSet4="False" Typography.StylisticSet5="False" Typography.StylisticSet6="False" Typography.StylisticSet7="False" Typography.StylisticSet8="False" Typography.StylisticSet9="False" Typography.StylisticSet10="False" Typography.StylisticSet11="False" Typography.StylisticSet12="False" Typography.StylisticSet13="False" Typography.StylisticSet14="False" Typography.StylisticSet15="False" Typography.StylisticSet16="False" Typography.StylisticSet17="False" Typography.StylisticSet18="False" Typography.StylisticSet19="False" Typography.StylisticSet20="False" Typography.Capitals="Normal" Typography.CapitalSpacing="False" Typography.Kerning="True" Typography.CaseSensitiveForms="False" Typography.HistoricalForms="False" Typography.Fraction="Normal" Typography.NumeralStyle="Normal" Typography.NumeralAlignment="Normal" Typography.SlashedZero="False" Typography.MathematicalGreek="False" Typography.Variants="Normal" TextOptions.TextHintingMode="Fixed" TextOptions.TextFormattingMode="Ideal" TextOptions.TextRenderingMode="Auto" TextAlignment="Left" LineHeight="0" LineStackingStrategy="MaxHeight"><Run>Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.</Run></Paragraph></Section>'

我的目标是提取Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.之间的文本<run></run>

我用正则表达式做了它,但它不适用于一些 xml 字符串,所以我尝试使用xml.etree.ElementTree但我没有成功访问嵌套在<run></run>

如何使用 XML 解析器提取此文本?

尤里

这是获取数据的简单方法:

xmlstr = '<Section xml:space="preserve" HasTrailingParagraphBreakOnPaste="False" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"><Paragraph FontSize="11" FontFamily="Portable User Interface" Foreground="#FF000000" FontWeight="Normal" FontStyle="Normal" FontStretch="Normal" CharacterSpacing="0" Typography.AnnotationAlternates="0" Typography.EastAsianExpertForms="False" Typography.EastAsianLanguage="Normal" Typography.EastAsianWidths="Normal" Typography.StandardLigatures="True" Typography.ContextualLigatures="True" Typography.DiscretionaryLigatures="False" Typography.HistoricalLigatures="False" Typography.StandardSwashes="0" Typography.ContextualSwashes="0" Typography.ContextualAlternates="True" Typography.StylisticAlternates="0" Typography.StylisticSet1="False" Typography.StylisticSet2="False" Typography.StylisticSet3="False" Typography.StylisticSet4="False" Typography.StylisticSet5="False" Typography.StylisticSet6="False" Typography.StylisticSet7="False" Typography.StylisticSet8="False" Typography.StylisticSet9="False" Typography.StylisticSet10="False" Typography.StylisticSet11="False" Typography.StylisticSet12="False" Typography.StylisticSet13="False" Typography.StylisticSet14="False" Typography.StylisticSet15="False" Typography.StylisticSet16="False" Typography.StylisticSet17="False" Typography.StylisticSet18="False" Typography.StylisticSet19="False" Typography.StylisticSet20="False" Typography.Capitals="Normal" Typography.CapitalSpacing="False" Typography.Kerning="True" Typography.CaseSensitiveForms="False" Typography.HistoricalForms="False" Typography.Fraction="Normal" Typography.NumeralStyle="Normal" Typography.NumeralAlignment="Normal" Typography.SlashedZero="False" Typography.MathematicalGreek="False" Typography.Variants="Normal" TextOptions.TextHintingMode="Fixed" TextOptions.TextFormattingMode="Ideal" TextOptions.TextRenderingMode="Auto" TextAlignment="Left" LineHeight="0" LineStackingStrategy="MaxHeight"><Run>Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.</Run></Paragraph></Section>'
from xml.etree import cElementTree as ET
results = []
root = ET.fromstring(xmlstr)
for p in list(root):
 for r in list(p):
  print(r.text)
  results.append(r.text)

结果:

阻燃发泡聚苯乙烯 (EPS) 面板对一平方米的表面积执行隔热功能,确保 R = 3.55 K.m².W-1 的热阻。

如果您在 python 交互式提示中运行代码,最后您可以使用结果:

>>> results
['Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.']
>>> results[0]
'Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.'

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章