Handling blank values of attributes in xml
I am printing values of XML tags and attributes present in them. If the value of any attribute or tag is blank then I am trying to print None
. I am able to do it for empty tags but code is not printing None
if there is any blank attribute value.
XML (a.xml):
<?xml version="1.0"?> <?xml-stylesheet href="catalog.xsl" type="text/xsl"?> <!DOCTYPE catalog SYSTEM "catalog.dtd"> <catalog> <product description="Cardigan Sweater" product_image="cardigan.jpg"> <catalog_item gender="Men's"> <item_number sep = "help" dep = "paraug" note = "zempu">QWZ5671</item_number> <line cap = "delp" des = "" fote = "cat"></line> <cool_number>QWZ5671</cool_number> <price>39.5</price> <price></price> </catalog_item> </product> </catalog>
code:
from lxml import etree from collections import defaultdict root_1 = etree.parse('a.xml').getroot() d1= [] for node in root_1.findall('.//catalog_item'): item = defaultdict(list) for x in node.iter(): # iterate over the items for k, v in x.attrib.items(): item[k].append(v) if x.attrib is None: item[x.attrib].append('None') if x.text is None: item[x.tag].append('None') elif x.text.strip(): item[x.tag].append(x.text.strip()) d1.append(dict(item)) print(d1)
Current output: attribute value of des
is blank in XML hence it is coming here blank but line tag is coming up with None
[{'gender': ["Men's"], 'sep': ['help'], 'dep': ['paraug'], 'note': ['zempu'], 'item_number': ['QWZ5671'], 'cap': ['delp'], 'des': [''], 'fote': ['cat'], 'line': ['None'], 'cool_number': ['QWZ5671'], 'price': ['39.5', 'None']}]
Expected output: if the attribute value is blank then None
should come for that also as shown for des
here
[{'gender': ["Men's"], 'sep': ['help'], 'dep': ['paraug'], 'note': ['zempu'], 'item_number': ['QWZ5671'], 'cap': ['delp'], 'des': ['None'], 'fote': ['cat'], 'line': ['None'], 'cool_number': ['QWZ5671'], 'price': ['39.5', 'None']}]```
the issue is with the way you are currently testing for empty attributes:
if x.attrib is None:
this checks whether a node has any attributes at all (x.attrib is the dict holding all of a nodes attributes). you could fix it by replacing this
for k, v in x.attrib.items(): item[k].append(v) if x.attrib is None: item[x.attrib].append('None')
by this
for k, v in x.attrib.items(): item[k].append(v if v else None) # use str(None) if you really need a string
which will produce the following output:
[{'note': ['zempu'], 'item_number': ['QWZ5671'], 'cool_number': ['QWZ5671'], 'cap': ['delp'], 'des': [None], 'sep': ['help'], 'fote': ['cat'], 'dep': ['paraug'], 'line': ['None'], 'price': ['39.5', 'None'], 'gender': ["Men's"]}]