In pandas, how to re-arrange the dataframe to simultaneously combine groups of columns?
I hope someone could help me solve my issue.
Given a pandas dataframe as depicted in the image below,
I would like to re-arrange it into a new dataframe, combining several sets of columns (the sets have all the same size) such that each set becomes a single column as shown in the desired result image below.
Thank you in advance for any tips.
You can do this with pd.wide_to_long
and rename
the ‘c’ column:
df_out = pd.wide_to_long(df.reset_index().rename(columns={'c':'c1'}), ['a','b','c'],'index','no') df_out = df_out.reset_index(drop=True).ffill().astype(int) df_out
Output:
a b c 0 0 2 4 1 1 3 4 2 5 7 9 3 6 8 9
Same dataframe just sorting is different.
pd.wide_to_long(df, ['a','b'], 'c', 'no').reset_index().drop('no', axis=1)
Output:
c a b 0 4 0 2 1 9 5 7 2 4 1 3 3 9 6 8
For a general solution, you can try one of this two options:
You could try this, using OrderedDict
to get the alpha-nonnumeric column names ordered alphabetically, pd.DataFrame.filter
to filter the columns with similar names, and then concat the values with pd.DataFrame.stack
:
import pandas as pd from collections import OrderedDict df = pd.DataFrame([[0,1,2,3,4],[5,6,7,8,9]], columns=['a1','a2','b1','b2','c']) newdf=pd.DataFrame() for col in list(OrderedDict.fromkeys( ''.join(df.columns)).keys()): if col.isalpha(): newdf[col]=df.filter(like=col, axis=1).stack().reset_index(level=1,drop=True) newdf=newdf.reset_index(drop=True)
Output:
df a1 a2 b1 b2 c 0 0 1 2 3 4 1 5 6 7 8 9 newdf a b c 0 0 2 4 1 1 3 4 2 5 7 9 3 6 8 9
Another way to get the column names could be using re
and set
like this, and then sort columns alphabetically:
newdf=pd.DataFrame() import re for col in set(re.findall('[^\W\d_]',''.join(df.columns))): newdf[col]=df.filter(like=col, axis=1).stack().reset_index(level=1,drop=True) newdf=newdf.reindex(sorted(newdf.columns), axis=1).reset_index(drop=True)
Output:
newdf a b c 0 0 2 4 1 1 3 4 2 5 7 9 3 6 8 9
The fact that column c only had one columns versus other letters having two columns, made it kind of tricky. I first stacked the dataframe and got rid of the numbers in the column names. Then for a and b I pivoted a dataframe and removed all nans. For c, I multiplied the length of the dataframe by 2 to make it match a and b and then merged it in with a and b.
input:
import pandas as pd df = pd.DataFrame({'a1': {0: 0, 1: 5}, 'a2': {0: 1, 1: 6}, 'b1': {0: 2, 1: 7}, 'b2': {0: 3, 1: 8}, 'c': {0: 4, 1: 9}}) df
code:
df1=df.copy().stack().reset_index().replace('[0-9]+', '', regex=True) dfab = df1[df1['level_1'].isin(['a','b'])].pivot(index=0, columns='level_1', values=0) \ .apply(lambda x: pd.Series(x.dropna().values)).astype(int) dfc = pd.DataFrame(np.repeat(df['c'].values,2,axis=0)).rename({0:'c'}, axis=1) df2=pd.merge(dfab, dfc, how='left', left_index=True, right_index=True) df2
output:
a b c 0 0 2 4 1 1 3 4 2 5 7 9 3 6 8 9