How to categorize a list of data by keyword in python?
I have a list of transactions, including things like
*"AMZN mktp US*MH434G300", *"HEALTH CARE WEB PMT",* *"ARBYS #4323"**
etc, and I want to write a program that will look for keywords in these descriptions, and assign a category based on these keywords. I haven’t found anything like this in my internet searches surprisingly, and I suppose its possible its because its difficult to do.
What I have done so far is something like this:
def getCategory(description): cat = '' if 'AMZN' in description: cat = 'shopping' elif 'ARBYS' in description: cat = 'restaurant' return cat
While this does work, its extremely painstaking, and I have to write a separate if statement for each and every keyword. There has to be a better way to do this. Is there a library for something like this? Even just a way I could add a bunch of keywords to a list, and then use the list in the if statement would be amazing.
I’m not worried about speed/efficiency, as there isn’t an insane amount of data (a few thousand entries). I’m using python 3. I am very open to any learning experience, I am trying to learn more about this kind of stuff. Any suggestions are extremely welcome and appreciated. Thanks!
While this is still slightly tedious, it’s less tedious than your solution. I would use a dictionary to assign each keyword to a specific group. I would write it like this:
def getCategory(description): my_dict = {'AMZN': 'shopping', 'ARBYS': 'restaurant'} for i in my_dict: if i in description: return my_dict[i] return None #Return none of none of the keywords are in the description
I have to write a separate if statement for each and every keyword. There has to be a better way to do this.
You can use a dictionary to store mapping of keywords to categories, and iterate the dict to find a match.
categories_dict = {"AMZN": "shopping", "ARBYS": "restaurant"} def get_category(description): for key in categories_dict: if key in description: return categories_dict.get(key) return None
Using the linked answer, here is some sample code that may be helpful: https://stackoverflow.com/a/33406474/13124888 (reference).
Before diving into the code, I would highly recommend looking at re
(which stands for regular expressions), which is a powerful library in native Python that you can use for finding keywords, swapping out text patterns, etc. You can documentation for this library here: https://docs.python.org/3/library/re.html.
Please also see the code snippet below, which is based off of the code in the linked post:
import re matches_list = ['AMZN', 'ARBYS', ... ] # Keywords list matches_to_category = {'AMZN': 'shopping', 'ARBYS': 'restuarant', ...} # keyword --> type dict def match(input_string, string_list): cat = [] # Initialize words = re.findall(r'\w+', input_string) keywords = set([word for word in words if word in string_list]) for keyword in keywords: # Iterate over words found for a line cat.append(matches_to_category[keyword]) # Add category to keyword return cat >>> sentence = "AMZN is great for shopping; ARBYS has the meats!" >>> match(sentence, matches_list) ['shopping', 'restuarant']