Using Pandas to merge similar data

How do I merge similar data such as "recommendation" into one value?

df['Why you choose us'].str.lower().value_counts()  location                           35 recommendation                     23 recommedation                       8 confort                             7 availability                        4 reconmmendation                     3 facilities                          3 

enter image description here

Add Comment
1 Answer(s)

print(df)

            reason  count 0         location     35 1   recommendation     23 2    recommedation      8 3          confort      7 4     availability      4 5  reconmmendation      3 6       facilities      3 

.groupby(), partial string..transform() while finding the sum

df['groupcount']=df.groupby(df.reason.str[0:4])['count'].transform('sum')              reason  count  groupcount 0         location     35          35 1   recommendation     23          34 2    recommedation      8          34 3          confort      7           7 4     availability      4           4 5  reconmmendation      3          34 6       facilities      3           3 

If needed to see string and partial string side by side. Try

df=df.assign(groupname=df.reason.str[0:4]) df['groupcount']=df.groupby(df.reason.str[0:4])['count'].transform('sum') print(df)         reason  count groupname  groupcount 0         location     35      loca          35 1   recommendation     23      reco          34 2    recommedation      8      reco          34 3          confort      7      conf           7 4     availability      4      avai           4 5  reconmmendation      3      reco          34 6       facilities      3      faci           3 

Incase you have multiple items in a row like you have in the csv; then

#Read csv df=pd.read_csv(r'path') #Create another column which is a list of values 'Why you choose us' in each row df['Why you choose us']=(df['Why you choose us'].str.lower().fillna('no comment given')).str.split(',') #Explode group to ensure each unique reason is int its own row but with all the otehr attrutes intact df=df.explode('Why you choose us') #remove any white spaces before values in the column group and value_counts df['Why you choose us'].str.strip().value_counts() print(df['Why you choose us'].str.strip().value_counts())  location            48 no comment given    34 recommendation      25 confort              8 facilities           8 recommedation        8 price                7 availability         6 reputation           5 reconmmendation      3 internet             3 ac                   3 breakfast            3 tranquility          2 cleanliness          2 aveilable            1 costumer service     1 pool                 1 comfort              1 search engine        1 Name: group, dtype: int64 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.