Using Pandas to merge similar data
How do I merge similar data such as "recommendation" into one value?
df['Why you choose us'].str.lower().value_counts() location 35 recommendation 23 recommedation 8 confort 7 availability 4 reconmmendation 3 facilities 3
print(df)
reason count 0 location 35 1 recommendation 23 2 recommedation 8 3 confort 7 4 availability 4 5 reconmmendation 3 6 facilities 3
.groupby()
, partial string..transform()
while finding the sum
df['groupcount']=df.groupby(df.reason.str[0:4])['count'].transform('sum') reason count groupcount 0 location 35 35 1 recommendation 23 34 2 recommedation 8 34 3 confort 7 7 4 availability 4 4 5 reconmmendation 3 34 6 facilities 3 3
If needed to see string and partial string side by side. Try
df=df.assign(groupname=df.reason.str[0:4]) df['groupcount']=df.groupby(df.reason.str[0:4])['count'].transform('sum') print(df) reason count groupname groupcount 0 location 35 loca 35 1 recommendation 23 reco 34 2 recommedation 8 reco 34 3 confort 7 conf 7 4 availability 4 avai 4 5 reconmmendation 3 reco 34 6 facilities 3 faci 3
Incase you have multiple items in a row like you have in the csv; then
#Read csv df=pd.read_csv(r'path') #Create another column which is a list of values 'Why you choose us' in each row df['Why you choose us']=(df['Why you choose us'].str.lower().fillna('no comment given')).str.split(',') #Explode group to ensure each unique reason is int its own row but with all the otehr attrutes intact df=df.explode('Why you choose us') #remove any white spaces before values in the column group and value_counts df['Why you choose us'].str.strip().value_counts() print(df['Why you choose us'].str.strip().value_counts()) location 48 no comment given 34 recommendation 25 confort 8 facilities 8 recommedation 8 price 7 availability 6 reputation 5 reconmmendation 3 internet 3 ac 3 breakfast 3 tranquility 2 cleanliness 2 aveilable 1 costumer service 1 pool 1 comfort 1 search engine 1 Name: group, dtype: int64