Read XML block in Python
I have an XML file like below which contain multiple xml. I want to fetch <Sacd>
content.
<?xml version="1.0" encoding="utf-8"?> <Sacd> <Acdpktg> <Acdpktg/> </Sacd> <?xml version="1.0" encoding="utf-8"?> <Sacd> <Acdpktg/> </Sacd> <?xml version="1.0" encoding="utf-8"?> <Sacd> <AcdpktG> <Result Value="0"/> <Packet Value="Dnd"/> <Invoke Value="abc"/> </AcdpktG> </Sacd>
How do I extract the value inside Sacd tag?
Well, your xml is problematic in several respects. First, it contains multiple xml files within in – not a good idea; they have to be split into separate xml files. Second, the first <Acdpktg> <Acdpktg/>
tag pair is invalid; it should be <Acdpktg> </Acdpktg>
.
But once it’s all fixed, you can get your expected output. So:
from lxml import etree big = """[your xml above,fixed]""" smalls = big.replace('<?xml','xxx<?xml').split('xxx')[1:] #split it into small xml files for small in smalls: xml = bytes(bytearray(small, encoding='utf-8')) #either this, or remove the xml declarations from each small file doc = etree.XML(xml) for value in doc.xpath('.//AcdpktG//*/@Value'): print(value)
Output:
0 Dnd abc
Or, a bit fancier output can be obtained by changing the inner for
loop a bit:
for value in doc.xpath('.//AcdpktG//*'): print(value.tag, value.xpath('./@Value')[0])
Output:
Result 0 Packet Dnd Invoke abc