lxml xpass can't find a tag below first one in xml

60 views Asked by At

I have an xml doc that looks something like this

<MyXmlRoot>
<App xmlns='urn:SomethingSomething1'>
    ...
</App>
<User xmlns='urn:SomethingSomething2'>
    ...
</User>
<Doc xmlns='urn:SomethingSomething3'>
    <level2>
        <level3>
            <level4>
                <level5>
                    <level6>
                        <level7>
                            <level8>
                                <level9>
                                    <level10>Content at the deepest level</level10>
                                </level9>
                            </level8>
                        </level7>
                    </level6>
                </level5>
            </level4>
        </level3>
    </level2>
</Doc>

I use lxml to read it and parse it like this

tree = etree.parse("textxml.xml")
root = tree.getroot()

if I do pretty print from root it will show the entire xml. which is good but when I try to read specific tags values like so

content = root.xpath('//level10/text()')

xpath can't find any tag below the root and returns empty list I suspect it's because of the namespaces but can't find a solution to make xpath read values any advice ?

1

There are 1 answers

3
Andrej Kesely On BEST ANSWER

Add xmlns {urn:SomethingSomething3} to the tag you want to search:

from lxml import etree

xml_data = """
<MyXmlRoot>
    <App xmlns='urn:SomethingSomething1'>
    </App>
    <User xmlns='urn:SomethingSomething2'>
    </User>
    <Doc xmlns='urn:SomethingSomething3'>
        <level2>
            <level3>
                <level4>
                    <level5>
                        <level6>
                            <level7>
                                <level8>
                                    <level9>
                                        <level10>Content at the deepest level</level10>
                                    </level9>
                                </level8>
                            </level7>
                        </level6>
                    </level5>
                </level4>
            </level3>
        </level2>
    </Doc>
</MyXmlRoot>
"""

root = etree.fromstring(xml_data)

level10_text = root.find(".//{urn:SomethingSomething3}level10").text
print("Text from <level10> tag:", level10_text)

Prints:

Text from <level10> tag: Content at the deepest level

OR: Use etree.ETXPath:

to_search = etree.ETXPath("//{urn:SomethingSomething3}level10/text()")
level10_text = to_search(root)
print("Text from <level10> tag:", level10_text)

Prints:

Text from <level10> tag: ['Content at the deepest level']