How to traverse lists containing a dictionary?
I am trying to traverse JSON data brought into a Dataframe.
Here is the code used to bring the data in:
df = json_normalize(data['PatentBulkData'])
Each series of the Dataframe is a list. Each list contains a list of dictionaries as represented below.
For example, here is the list of dictionaries returned when I enter df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]
:
[{'eventCode': 'PG-ISSUE', 'eventDate': '2020-04-23', 'eventDescriptionText': 'PG-Pub Issue Notification'}, {'eventCode': 'RQPR', 'eventDate': '2020-01-02', 'eventDescriptionText': 'Request for Foreign Priority (Priority Papers May Be Included)'}, {'eventCode': 'M844', 'eventDate': '2020-01-03', 'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'}, {'eventCode': 'M844', 'eventDate': '2020-01-02', 'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'}, {'eventCode': 'COMP', 'eventDate': '2020-02-04', 'eventDescriptionText': 'Application Is Now Complete'}]
Then, df['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j]
would return the dictionary:
{'eventCode': 'PG-ISSUE', 'eventDate': '2020-04-23', 'eventDescriptionText': 'PG-Pub Issue Notification'}
I would like to iterate through each entry in the df['prosecutionHistoryDataBag.prosecutionHistoryData']
to identify rows containing a specific string in 'eventDescriptionText'
.
In the above example df['prosecutionHistoryDataBag.prosecutionHistoryData']
is a Series, df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]
is a list, and ['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j]
is a dictionary.
I would like to initially iterate through the list – and for each list iterate through the dictionary to see if ‘eventDescriptionText’ contains a specific string.
Thanks!
Try using the below code.
for lst in df['prosecutionHistoryDataBag.prosecutionHistoryData']: for I in lst: if I.get("eventDescriptionText").find(your_string) != -1: # do something pass
If I understand your question correctly then
df['prosecutionHistoryDataBag.prosecutionHistoryData']
is, in fact, a list whose elements are lists of dictionaries. See also my comment above. If that is the case, then the boring way is:
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData'] for dicts in lst: for d in dicts: if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR': code = d['eventCode'] date = d['eventDate'] # Do something with code and date.
Now, you could flatten that list of lists and use a generator:
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData'] for d in (d for dicts in lst for d in dicts): if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR': code = d['eventCode'] date = d['eventDate'] # Do something with code and date.
Next, squeeze the test into the lists-flattening-generator as well to make the code a bit less readable:
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData'] for code, date in ((d['eventCode'], d['eventDate']) for dicts in lst for d in dicts if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR'): # Do something with code and date.
The filter() function doesn’t help much with readability here
for code, date in ((d['eventCode'], d['eventDate']) for d in filter(lambda d: d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR', (d for dicts in lst for d in dicts))): # Do something with code and date.
but other itertools or more-itertools may be of use (e.g. the flatten() function).