I cannot extract the text from an element using ElementTree

Luis Published at Dev

Luis

A snippet of my document and the code is as follows:

import xml.etree.ElementTree as ET
obj = ET.fromstring("""
   <tab>
    <infos><bounds left="7947" top="88607" width="10086" height="1184" bottom="89790" right="18032" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="true" mbFramePrintAreaValid="true"/>     <prtBounds left="115" top="0" width="9300" height="1169" bottom="1168" right="9414"/> </infos>
    <row > <infos> <bounds left="8062" top="88607" width="9300" height="524" bottom="89130" right="17361" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="true" mbFramePrintAreaValid="true"/>      <prtBounds left="0" top="0" width="9300" height="524" bottom="523" right="9299"/>      </infos>
     <cell ptr="000002232E644270" id="199" symbol="class SwCellFrame" next="202" upper="198" lower="200" rowspan="1"> <infos> <bounds left="8062" top="88607" width="546" height="524" bottom="89130" right="8607" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="true" mbFramePrintAreaValid="true"/>        <prtBounds left="7" top="15" width="532" height="509" bottom="523" right="538"/>  </infos>
      <txt> <infos> <bounds left="8069" top="88622" width="532" height="187" bottom="88808" right="8600" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="false" mbFramePrintAreaValid="true"/> <prtBounds left="0" top="3" width="532" height="184" bottom="186" right="531"/>        </infos>
       <Finish/>
      </txt>
      <txt> <infos> <bounds left="8069" top="88809" width="532" height="149" bottom="88957" right="8600" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="false" mbFramePrintAreaValid="true"/> <prtBounds left="136" top="0" width="396" height="149" bottom="148" right="531"/> </infos>
UDA       <Finish/>
      </txt>
     </cell>
     <cell ptr="000002232E642E40" id="202" symbol="class SwCellFrame" next="205" prev="199" upper="198" lower="203" rowspan="1"> <infos> <bounds left="8608" top="88607" width="3283" height="524" bottom="89130" right="11890" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="true" mbFramePrintAreaValid="true"/> <prtBounds left="7" top="15" width="3269" height="509" bottom="523" right="3275"/> </infos>
      <txt>
       <infos> <bounds left="8615" top="88622" width="3269" height="180" bottom="88801" right="11883" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="false" mbFramePrintAreaValid="true"/> <prtBounds left="0" top="7" width="3269" height="173" bottom="179" right="3268"/> </infos> <Finish/>
      </txt>
      <txt> <infos> <bounds left="8615" top="88802" width="3269" height="149" bottom="88950" right="11883" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="false" mbFramePrintAreaValid="true"/> <prtBounds left="58" top="0" width="3170" height="149" bottom="148" right="3227"/> </infos>
Nombre       <Finish/>
      </txt>
     </cell>
    </row>
  </tab>
""")
a = obj.findall('./row/cell/txt')
for i, item in enumerate(a):
    print(i, item.text.strip())

But if I simplify the document, I do manage to extract the text,

obj = ET.fromstring("""
   <tab>
    <row>
     <cell > 
      <txt > <Finish/> </txt>
      <txt > UDA <Finish/> </txt>
     </cell>
     <cell >
      <txt > <Finish/> </txt>
      <txt > Nombre       <Finish/> </txt>
     </cell>
   </row>
  </tab>
""")

a = obj.findall('./row/cell/txt')
for i, item in enumerate(a):
    print(i, item.text.strip())
0 
1 UDA
2 
3 Nombre

I don't know how to solve this problem, because my working document is very large and I can't simplify it as I have done in this example.

mzjn

The "UDA" and "Nombre" strings are found in the tail of infos elements. The easiest way to get the wanted output is to use itertext():

a = obj.findall('./row/cell/txt')
for i, item in enumerate(a):
    text = "".join([s.strip() for s in item.itertext()])
    print(i, text)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2022-08-22

Comments

0 comments

I cannot extract the text from an element using ElementTree

I cannot extract the text from an element using ElementTree

pump.io port in URL

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Using Response.Redirect with Friendly URLS in ASP.NET

Can a 32-bit antivirus program protect you from 64-bit threats

Double spacing in rmarkdown pdf

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

3D Touch Peek Swipe Like Mail

Bootstrap 5 Static Modal Still Closes when I Click Outside

Assembly definition can't resolve namespaces from external packages

Vector input in shiny R and then use it

Emulator wrong screen resolution in Android Studio 1.3

Svchost high CPU from Microsoft.BingWeather app errors

Graphics Context misaligned on first paint

Python connect to firebird docker database

Is this docker-for-mac password dialog legit?

How to save models trained locally in Amazon SageMaker?