Python XML comparison is failing due to extra element tag in one of the XMLs
I have a script which is comparing two XMLs. Comparison is working fine if all the element tags are the same under <account>
tag but after adding an extra tag <branchID5>
in b.xml
for account# 600789488
then it is not printing the differences.
a.xml <svc> <accounts> <account> <acctBasicInfo> <acctName>600789488</acctName> <branchID2>56</branchID2> <realparties> <realparty> <realname>lui</realname> </realparty> </realparties> </acctBasicInfo> </account> <account> <acctBasicInfo> <acctName>44646</acctName> <branchID2>86</branchID2> <realparties> <realparty> <realname>lui</realname> </realparty> </realparties> </acctBasicInfo> </account> </accounts> </svc>
b.xml <svc> <accounts> <account> <acctBasicInfo> <acctName>44646</acctName> <branchID2>86</branchID2> <realparties> <realparty> <realname>lui</realname> </realparty> </realparties> </acctBasicInfo> </account> <account> <acctBasicInfo> <acctName>600789488</acctName> <branchID2>56</branchID2> <branchID5>66</branchID5> <realparties> <realparty> <realname>lu</realname> </realparty> </realparties> </acctBasicInfo> </account> </accounts> </svc>
code:
from lxml import etree from collections import defaultdict from pprintpp import pprint as pp root_1 = etree.parse('a.xml').getroot() root_2 = etree.parse('b.xml').getroot() d1, d2 = [], [] for node in root_1.findall('.//account'): item = defaultdict(list) for x in node.iter(): for k, v in x.attrib.items(): item[k].append(v) if x.text is None: item[x.tag].append('None') elif x.text.strip(): item[x.tag].append(x.text.strip()) d1.append(dict(item)) for node in root_2.findall('.//account'): item = defaultdict(list) for x in node.iter(): for k, v in x.attrib.items(): item[k].append(v) if x.text is None: item[x.tag].append('None') elif x.text.strip(): item[x.tag].append(x.text.strip()) d2.append(dict(item)) d1 = sorted(d1, key = lambda x: x['acctName']) d2 = sorted(d2, key = lambda x: x['acctName']) print(d1) print(d2) res_dict = defaultdict(list) for x, y in zip(d1, d2): for key1, key2 in zip(x.keys(), y.keys()): if (key1 == key2) and sorted(x[key1]) != sorted(y[key2]): a =set(x[key1]) b = set(y[key2]) diff = ([(i+'--'+'test1.xml') if i in a else (i+'--'+'test2.xml') if i in b else '' for i in list(a^b)]) res_dict[x['acctName'][0]].append({key1: diff}) if res_dict == {}: print('Data is same in both XML files') else: pp(dict(res_dict))
Current output: It is not finding the differences. because branchID5': ['66']
is coming before different realname': ['lu']
in d2
d1: [{'acctName': ['44646'], 'branchID2': ['86'], 'realname': ['lui']}, {'acctName': ['600789488'], 'branchID2': ['56'], 'realname': ['lui']}] d2: [{'acctName': ['44646'], 'branchID2': ['86'], 'realname': ['lui']}, {'acctName': ['600789488'], 'branchID2': ['56'], 'branchID5': ['66'], 'realname': ['lu']}] Data is same in both XML files
Expected output: It should print the differences. It should ignore the uncommon element tags from both the xmls
{'600789488': [{'realname': ['lui--test1.xml', 'lu--test2.xml']}]}
I believe you made it a little more complicated than absolutely necessary. Since you are using etree, you might as well use xpath to get there.
names1 = root1.xpath('.//account/acctBasicInfo') for name in names1: rn = name.xpath('.//realname/text()')[0] #get the real name in root1 actNm = name.xpath('./acctName/text()')[0] #get the acctName in root1 #next line is the key: create a search expression to find in root2 an account with the same acctName as in the current node of root1 exp = f'.//account/acctBasicInfo[acctName/text()={actNm}]//realname/text()' twin = root2.xpath(exp)[0] #execute the search #now compare the real names in both accounts in the two roots, and if not the same, create alert if rn != twin: print({f'{actNm}': [{'realname': [f'{rn}--test1.xml', f'{twin}--test2.xml']}]})
Output:
{'600789488': [{'realname': ['lui--test1.xml', 'lu--test2.xml']}]}