Python XML comparison is failing due to extra element tag in one of the XMLs

I have a script which is comparing two XMLs. Comparison is working fine if all the element tags are the same under <account> tag but after adding an extra tag <branchID5> in b.xml for account# 600789488 then it is not printing the differences.

a.xml  <svc>   <accounts>     <account>       <acctBasicInfo>         <acctName>600789488</acctName>         <branchID2>56</branchID2>         <realparties>           <realparty>             <realname>lui</realname>           </realparty>          </realparties>          </acctBasicInfo>       </account>     <account>       <acctBasicInfo>         <acctName>44646</acctName>         <branchID2>86</branchID2>         <realparties>           <realparty>             <realname>lui</realname>           </realparty>          </realparties>          </acctBasicInfo>       </account>   </accounts> </svc>   
b.xml  <svc>   <accounts>     <account>       <acctBasicInfo>         <acctName>44646</acctName>         <branchID2>86</branchID2>         <realparties>           <realparty>             <realname>lui</realname>           </realparty>          </realparties>          </acctBasicInfo>       </account>     <account>       <acctBasicInfo>         <acctName>600789488</acctName>         <branchID2>56</branchID2>         <branchID5>66</branchID5>         <realparties>           <realparty>             <realname>lu</realname>           </realparty>          </realparties>          </acctBasicInfo>       </account>   </accounts> </svc>   

code:

from lxml import etree from collections import defaultdict from pprintpp import pprint as pp  root_1 = etree.parse('a.xml').getroot() root_2 = etree.parse('b.xml').getroot()  d1, d2 = [], [] for node in root_1.findall('.//account'):     item = defaultdict(list)     for x in node.iter():       for k, v in x.attrib.items():           item[k].append(v)       if x.text is None:         item[x.tag].append('None')       elif x.text.strip():         item[x.tag].append(x.text.strip())     d1.append(dict(item))    for node in root_2.findall('.//account'):     item = defaultdict(list)     for x in node.iter():       for k, v in x.attrib.items():           item[k].append(v)       if x.text is None:         item[x.tag].append('None')       elif x.text.strip():         item[x.tag].append(x.text.strip())     d2.append(dict(item))  d1 = sorted(d1, key = lambda x: x['acctName']) d2 = sorted(d2, key = lambda x: x['acctName']) print(d1) print(d2)   res_dict = defaultdict(list) for x, y in zip(d1, d2):   for key1, key2 in zip(x.keys(), y.keys()):       if (key1 == key2) and sorted(x[key1]) != sorted(y[key2]):         a =set(x[key1])         b = set(y[key2])         diff = ([(i+'--'+'test1.xml') if i in a else (i+'--'+'test2.xml') if i in b else '' for i in list(a^b)])         res_dict[x['acctName'][0]].append({key1: diff})  if res_dict == {}:   print('Data is same in both XML files') else:   pp(dict(res_dict)) 

Current output: It is not finding the differences. because branchID5': ['66'] is coming before different realname': ['lu'] in d2

d1: [{'acctName': ['44646'], 'branchID2': ['86'], 'realname': ['lui']}, {'acctName': ['600789488'], 'branchID2': ['56'], 'realname': ['lui']}] d2: [{'acctName': ['44646'], 'branchID2': ['86'], 'realname': ['lui']}, {'acctName': ['600789488'], 'branchID2': ['56'], 'branchID5': ['66'], 'realname': ['lu']}] Data is same in both XML files 

Expected output: It should print the differences. It should ignore the uncommon element tags from both the xmls

{'600789488': [{'realname': ['lui--test1.xml', 'lu--test2.xml']}]} 
Add Comment
1 Answer(s)

I believe you made it a little more complicated than absolutely necessary. Since you are using etree, you might as well use xpath to get there.

names1 = root1.xpath('.//account/acctBasicInfo')  for name in names1:     rn = name.xpath('.//realname/text()')[0] #get the real name in root1     actNm = name.xpath('./acctName/text()')[0] #get the acctName in root1     #next line is the key: create a search expression to find in root2 an account with the same acctName as in the current node of root1     exp = f'.//account/acctBasicInfo[acctName/text()={actNm}]//realname/text()'         twin = root2.xpath(exp)[0] #execute the search     #now compare the real names in both accounts in the two roots, and if not the same, create alert     if rn != twin:         print({f'{actNm}': [{'realname': [f'{rn}--test1.xml', f'{twin}--test2.xml']}]}) 

Output:

{'600789488': [{'realname': ['lui--test1.xml', 'lu--test2.xml']}]} 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.