IN Pandas Python Dataframe I need to set values of a column based on the values of other columns
I Have a dataframe with 3 columns namely cuid ,type , errorreason. Now the error reason is empty and I have to fill it with the following logic- 1.) If cuid is unique and type is ‘COL’ then errorreason is ‘NO ERROR'( ALL UNIQUE VALUES ARE ‘NO ERROR’)
2.) If cuid is not unique , and type is ‘COL’ AND ‘ROT’ , then error is errorreason is ‘AD’
3.) If cuid is not unique , and type is ‘COL’ AND ‘TOT’ , then error is errorreason is ‘RE’
4.) Any other case , except the above mentioned , errorreason is ‘Unidentified’
I have already seperated the unique and non unique values , so first point is done. Kinda stuck on the next points . I was trying to group by the non unique values and then apply a function. Kinda stuck here.
This is a quite long solution, but I inserted explanations for each step so that they are clear to you. At the end you obtain your desired output
import numpy as np import pandas as pd # sample data df = pd.DataFrame({ 'cuid': [100814, 100814, 100815, 100815, 100816], 'type': ['col', 'rot', 'col', 'tot', 'col'] }) # define function for concatenating 'type' within the same 'cuid' def str_cat(x): return x.str.cat(sep=', ') # create a lookup dataset that we will merge later on df_lookup = df.groupby('cuid').agg({ 'cuid': 'count', 'type': str_cat }).rename(columns={'cuid': 'unique'}) # create the variable 'error_reason' on this lookup dataset thanks to a case when like statement using np.select df_lookup['error_reason'] = np.select( [ (df_lookup['cuid'] == 1) & (df_lookup['type'] == 'col'), (df_lookup['cuid'] > 1) & (df_lookup['type'].str.contains('col')) & (df_lookup['type'].str.contains('rot')), (df_lookup['cuid'] > 1) & (df_lookup['type'].str.contains('col')) & (df_lookup['type'].str.contains('tot')) ], [ 'NO ERROR', 'AD', 'RE' ], default = 'Unidentified' ) # merge the two datasets df.merge(df_lookup.drop(columns=['type', 'unique']), on='cuid')
Output
cuid type error_reason 0 100814 col AD 1 100814 rot AD 2 100815 col RE 3 100815 tot RE 4 100816 col NO ERROR
Try to use this:
df.groupby('CUID',as_index=False)['TYPE'].aggregate(lambda x: list(x))
I have not tested this solution so let me know if it does not work.