How to xml parse text from a tag containing "<?>"

My goal is to get the text: 27. The method according to claim 23 wherein...
How do I go about retrieving the text inside a tag that contains <?. I believe they are called php short tags from googling it.

I am using a lxml, xpaths and they seem to just not register it as a tag or a node. I tried itertext() but that doesnt work as well.

 <claim id="CLM-00027" num="00027">             <claim-text>                <?insert-start id="REI-00005" date="20191203" ?>27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.                <?insert-end id="REI-00005" ?></claim-text>         </claim> 
Add Comment
2 Answer(s)

Here’s a piece of code that does that, using XPath to reach the deepest ‘valid’ tag, and then getchildren and tail to dive deeper from there all the way to the actual text.

import lxml xml=""" <claim id="CLM-00027" num="00027">             <claim-text>                <?insert-start id="REI-00005" date="20191203" ?>27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.                <?insert-end id="REI-00005" ?></claim-text>         </claim>"""  root = lxml.etree.fromstring(xml) e = root.xpath("/claim/claim-text") res = e[0].getchildren()[0].tail print(res) 

Output:

’27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.

Add Comment

Access specific child nodes by index.

from xml.etree import ElementTree as ET tree = ET.parse('path_to_your.xml')  root = tree.getroot()  print(root[0].text) 

output:

        27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.                 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.