IN Pandas Python Dataframe I need to set values of a column based on the values of other columns

I Have a dataframe with 3 columns namely cuid ,type , errorreason. Now the error reason is empty and I have to fill it with the following logic- DATAFRAME BEFORE PROCCESSING 1.) If cuid is unique and type is ‘COL’ then errorreason is ‘NO ERROR'( ALL UNIQUE VALUES ARE ‘NO ERROR’)

2.) If cuid is not unique , and type is ‘COL’ AND ‘ROT’ , then error is errorreason is ‘AD’

3.) If cuid is not unique , and type is ‘COL’ AND ‘TOT’ , then error is errorreason is ‘RE’

4.) Any other case , except the above mentioned , errorreason is ‘Unidentified’ Required dataframe after proccessing

I have already seperated the unique and non unique values , so first point is done. Kinda stuck on the next points . I was trying to group by the non unique values and then apply a function. Kinda stuck here.

Add Comment
2 Answer(s)

This is a quite long solution, but I inserted explanations for each step so that they are clear to you. At the end you obtain your desired output

import numpy as np import pandas as pd  # sample data df = pd.DataFrame({     'cuid': [100814, 100814, 100815, 100815, 100816],     'type': ['col', 'rot', 'col', 'tot', 'col'] })  # define function for concatenating 'type' within the same 'cuid' def str_cat(x):     return x.str.cat(sep=', ')  # create a lookup dataset that we will merge later on df_lookup = df.groupby('cuid').agg({     'cuid': 'count',     'type': str_cat }).rename(columns={'cuid': 'unique'})  # create the variable 'error_reason' on this lookup dataset thanks to a case when like statement using np.select df_lookup['error_reason'] = np.select(     [         (df_lookup['cuid'] == 1) & (df_lookup['type'] == 'col'),         (df_lookup['cuid'] > 1) & (df_lookup['type'].str.contains('col')) & (df_lookup['type'].str.contains('rot')),         (df_lookup['cuid'] > 1) & (df_lookup['type'].str.contains('col')) & (df_lookup['type'].str.contains('tot'))     ],     [         'NO ERROR',         'AD',         'RE'     ],     default = 'Unidentified' )  # merge the two datasets df.merge(df_lookup.drop(columns=['type', 'unique']), on='cuid') 

Output

     cuid type error_reason 0  100814  col           AD 1  100814  rot           AD 2  100815  col           RE 3  100815  tot           RE 4  100816  col     NO ERROR 
Add Comment

Try to use this:

df.groupby('CUID',as_index=False)['TYPE'].aggregate(lambda x: list(x)) 

I have not tested this solution so let me know if it does not work.

Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.