Python: Create dataframe from a list where each new row starts at a specific list index

I have a list where the number of items between a ‘SUMMARY’ element and the next one is not fixed

list = ['SUMMARY - Dec 2013', 'Person1', 'None', 'None', '10', 'SUMMARY - Dec 2013', 'Person2', '20', 'SUMMARY - Jan 2014', 'Person3', 'None'] 

What I’m trying to achieve is to transform the list to a dataframe where each row starts with a ‘SUMMARY’ element.

I’ve used

match = [] match.append([n for n, l in enumerate(list) if l.startswith('SUMMARY')]) 

with output [[0, 5, 8]] to get the indexes of the items that contain ‘SUMMARY’, and I would like each row of my dataframe to start with the corresponding items whose indexes are included in match. In this case, match has 3 elements, so I would like my dataframe to have 3 rows and the following structure:

'SUMMARY - Dec 2013', 'Person1', 'None', 'None', '10' 'SUMMARY - Dec 2013', 'Person2', '20',   NA,    NA 'SUMMARY - Jan 2014', 'Person3', 'None', NA,    NA 

Basically, when the number of items for a specific row is smaller than the number of max columns, the rest gets filled with NA/NaN.

Add Comment
1 Answer(s)

Thanks for the sample data. It is easier to approach this if you start with a single Series. You can group rows of data based on whether "Summary" is present and then use this to aggregate and re-explode your data across columns.

s = pd.Series(your_list) pd.DataFrame(s.groupby(s.str.contains('summary', case=False).cumsum())               .agg(list)               .tolist())                                                                         0        1     2     3     4 0  SUMMARY - Dec 2013  Person1  None  None    10 1  SUMMARY - Dec 2013  Person2    20  None  None 2  SUMMARY - Jan 2014  Person3  None  None  None 

Thanks @Shubham Sharma for the suggestion of iterating over groups:

pd.DataFrame([g.tolist() for k, g in s.groupby(     s.str.contains('summary', case=False).cumsum())])                      0        1     2     3     4 0  SUMMARY - Dec 2013  Person1  None  None    10 1  SUMMARY - Dec 2013  Person2    20  None  None 2  SUMMARY - Jan 2014  Person3  None  None  None 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.