How to solve Key Error while XML File Parsing in Python

Question

Home

How to solve Key Error while XML File Parsing in Python

0

I have the following XML file which I want to convert as a Pandas DataFrame.

row {'Id': '-1', 'Reputation': '1', 'CreationDate': '2009-09-28T00:00:00.000', 'DisplayName': 'Community', 'LastAccessDate': '2010-11-10T17:25:34.627', 'WebsiteUrl': 'http://meta.stackexchange.com/', 'Location': 'on the server farm', 'AboutMe': '<p>Hi, I\'m not really a person.</p>\n\n<p>I\'m a background process that helps keep this site clean!</p>\n\n<p>I do things like</p>\n\n<ul>\n<li>Randomly poke old unanswered questions every hour so they get some attention</li>\n<li>Own community questions and answers so nobody gets unnecessary reputation from them</li>\n<li>Own downvotes on spam/evil posts that get permanently deleted</li>\n<li>Own suggested edits from anonymous users</li>\n<li><a href="http://meta.stackexchange.com/a/92006">Remove abandoned questions</a></li>\n</ul>\n', 'Views': '0', 'UpVotes': '21001', 'DownVotes': '27468', 'AccountId': '-1'} row {'Id': '1', 'Reputation': '21228', 'CreationDate': '2009-09-28T14:35:46.490', 'DisplayName': 'Anton Geraschenko', 'LastAccessDate': '2020-05-17T06:51:32.333', 'WebsiteUrl': 'http://stacky.net', 'Location': 'Palo Alto, CA, United States', 'AboutMe': '<p>You can get in touch with me at [email protected].</p>\n', 'Views': '25360', 'UpVotes': '1052', 'DownVotes': '90', 'AccountId': '36500'}

The following code works for an almost identical XML file but when I use it for this file I get an error:

CODE

users_tree = ET.parse("/content/Users.xml") users_root = users_tree.getroot()  file_path_users = r"/content/Users.xml" dict_list_users = []  for _, elem in ET.iterparse(file_path_users, events=("end",)):     if elem.tag == "row":         dict_list_users.append({'UserId': elem.attrib['Id'],                           'Reputation': elem.attrib['Reputation'],                           'CreationDate': elem.attrib['CreationDate'],                           'DisplayName': elem.attrib['DisplayName'],                           'LastAccessDate': elem.attrib['LastAccessDate'],                           'WebsiteUrl': elem.attrib['WebsiteUrl'],                           'Location': elem.attrib['Location'],                           'AboutMe': elem.attrib['AboutMe'],                           'Views': elem.attrib['Views'],                           'UpVotes': elem.attrib['UpVotes'],                           'DownVotes': elem.attrib['DownVotes'],                           'AccountId': elem.attrib['AccountId']}) elem.clear()  df_users = pd.DataFrame(dict_list_users)

ERROR

KeyError                                  Traceback (most recent call last) <ipython-input-18-7af87798bae8> in <module>()      24                           'DisplayName': elem.attrib['DisplayName'],      25                           'LastAccessDate': elem.attrib['LastAccessDate'], ---> 26                           'WebsiteUrl': elem.attrib['WebsiteUrl'],      27                           'Location': elem.attrib['Location'],      28                           'AboutMe': elem.attrib['AboutMe'],  KeyError: 'WebsiteUrl'

NOTE: This error occurs for all attributes after LastAccessDate, i.e., even if I remove the WebsiteUrl key, I get error for the next attribute and so on.

Please provide me a way to fix this.

Roycegretchenclaudette Asked on July 17, 2020 in XML.

Share
Comment(0)

Add Comment

1 Answer(s)

Votes
Oldest

0

Error appears to be due to missing attributes in one or more of the <row> tags. Instead of explicitly assigning dictionary keys/values by each attribute consider retrieving all attributes. Doing so, the final DataFrame constructor will input NAs to rows with missing attributes.

for _, elem in ET.iterparse(file_path_users, events=("end",)):     if elem.tag == "row":         dict_list_users.append(elem.attrib)    # RETRIEVE ALL ATTRIBUTES          elem.clear()                           # SHOULD BE AT NESTED LEVEL  df_users = pd.DataFrame(dict_list_users)

If above pulls in more columns than needed, keep only the relevant columns with reindex:

df_users = df_users.reindexc(['UserId', 'Reputation', 'CreationDate', 'DisplayName',                               'LastAccessDate', 'WebsiteUrl', 'Location', 'AboutMe',                               'Views', 'UpVotes', 'DownVotes', 'AccountId'],                                axis='columns')

Stephenlesliemarlene Answered on July 17, 2020.

Share
Comment(0)

Add Comment

Your Answer

Answer 1

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 2

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 3

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 4

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 5

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 6

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 7

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 8

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

Answer 9

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 10

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 11

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 12

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 13

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 14

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 15

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 16

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

LATEST ANSWERS

How to solve Key Error while XML File Parsing in Python

Your Answer

TOP USERS

HOT QUESTIONS

LATEST ANSWERS

How to solve Key Error while XML File Parsing in Python

Your Answer

Tags Widget

TOP USERS

HOT QUESTIONS