How to traverse lists containing a dictionary?

I am trying to traverse JSON data brought into a Dataframe.

Here is the code used to bring the data in:

df = json_normalize(data['PatentBulkData']) 

Each series of the Dataframe is a list. Each list contains a list of dictionaries as represented below.

For example, here is the list of dictionaries returned when I enter df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]:

[{'eventCode': 'PG-ISSUE',   'eventDate': '2020-04-23',   'eventDescriptionText': 'PG-Pub Issue Notification'},  {'eventCode': 'RQPR',   'eventDate': '2020-01-02',   'eventDescriptionText': 'Request for Foreign Priority (Priority Papers May Be Included)'},  {'eventCode': 'M844',   'eventDate': '2020-01-03',   'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'},  {'eventCode': 'M844',   'eventDate': '2020-01-02',   'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'},  {'eventCode': 'COMP',   'eventDate': '2020-02-04',   'eventDescriptionText': 'Application Is Now Complete'}] 

Then, df['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j] would return the dictionary:

{'eventCode': 'PG-ISSUE',  'eventDate': '2020-04-23',  'eventDescriptionText': 'PG-Pub Issue Notification'} 

I would like to iterate through each entry in the df['prosecutionHistoryDataBag.prosecutionHistoryData'] to identify rows containing a specific string in 'eventDescriptionText'.

In the above example df['prosecutionHistoryDataBag.prosecutionHistoryData'] is a Series, df['prosecutionHistoryDataBag.prosecutionHistoryData'][i] is a list, and ['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j] is a dictionary.

I would like to initially iterate through the list – and for each list iterate through the dictionary to see if ‘eventDescriptionText’ contains a specific string.

Thanks!

Add Comment
2 Answer(s)

Try using the below code.

for lst in df['prosecutionHistoryDataBag.prosecutionHistoryData']:     for I in lst:         if I.get("eventDescriptionText").find(your_string) != -1:             # do something             pass 
Add Comment

If I understand your question correctly then

df['prosecutionHistoryDataBag.prosecutionHistoryData'] 

is, in fact, a list whose elements are lists of dictionaries. See also my comment above. If that is the case, then the boring way is:

lst = df['prosecutionHistoryDataBag.prosecutionHistoryData'] for dicts in lst:     for d in dicts:         if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR':             code = d['eventCode']             date = d['eventDate']             # Do something with code and date. 

Now, you could flatten that list of lists and use a generator:

lst = df['prosecutionHistoryDataBag.prosecutionHistoryData'] for d in (d for dicts in lst for d in dicts):     if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR':         code = d['eventCode']         date = d['eventDate']         # Do something with code and date. 

Next, squeeze the test into the lists-flattening-generator as well to make the code a bit less readable:

lst = df['prosecutionHistoryDataBag.prosecutionHistoryData'] for code, date in ((d['eventCode'], d['eventDate']) for dicts in lst for d in dicts if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR'):     # Do something with code and date. 

The filter() function doesn’t help much with readability here

for code, date in ((d['eventCode'], d['eventDate']) for d in filter(lambda d: d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR', (d for dicts in lst for d in dicts))):     # Do something with code and date.         

but other itertools or more-itertools may be of use (e.g. the flatten() function).

Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.