Reading tricky XML with XPATH

Nikhil

I am a beginner in Python & XPATH and need to read an XML with non-uniform nodes (similar to the one mentioned below) using XPATH. The output format to be written to the file is also shown below. The code uses lxml library.

Please help me build a correct XPATH.

Source XML

<Classes>
    <German>
        <Student>
            <Span><a href="">John</a></Span>
        </Student>
        <Student>
            <Span>Adam</Span>
        </Student>
    </German>
    <English>
        <Student>
            <Span>Mary</Span>
        </Student>
    </English>
    <French>
        <Student>
            <Span><a href="">Anil</a></Span>
        </Student>
        <Student>
            <Span><a href="">Jack</a></Span>
        </Student>
    </French>
    <Spanish>
        <Student>
            <Span>Mary</Span>
        </Student>
        <Student>
            <Span>Jack</Span>
        </Student>
    </Spanish>
</Classes>

Expected output

German
    John
    Adam
English
    Mary
French
    Anil
    Jack
Spanish
    Mary
    Jack

Thanks, Nikhil

Andrés Pérez-Albela H.

This code will help:

from lxml import html

xml_content = """<Classes>
    <German>
        <Student>
            <Span><a href="">John</a></Span>
        </Student>
        <Student>
            <Span>Adam</Span>
        </Student>
    </German>
    <English>
        <Student>
            <Span>Mary</Span>
        </Student>
    </English>
    <French>
        <Student>
            <Span><a href="">Anil</a></Span>
        </Student>
        <Student>
            <Span><a href="">Jack</a></Span>
        </Student>
    </French>
    <Spanish>
        <Student>
            <Span>Mary</Span>
        </Student>
        <Student>
            <Span>Jack</Span>
        </Student>
    </Spanish>
</Classes>"""

tree = html.fromstring(xml_content)
classes = tree.xpath('//classes/*')
for language_class in classes:
    print language_class.tag.capitalize()
    for student in language_class.xpath('.//student/span//text()'):
        print "    {}".format(student)

Output:

German
    John
    Adam
English
    Mary
French
    Anil
    Jack
Spanish
    Mary
    Jack

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related