How to categorize a list of data by keyword in python?

I have a list of transactions, including things like

*"AMZN mktp US*MH434G300",  *"HEALTH CARE WEB PMT",*  *"ARBYS #4323"**  

etc, and I want to write a program that will look for keywords in these descriptions, and assign a category based on these keywords. I haven’t found anything like this in my internet searches surprisingly, and I suppose its possible its because its difficult to do.

What I have done so far is something like this:

def getCategory(description):     cat = ''     if 'AMZN' in description:        cat = 'shopping'     elif 'ARBYS' in description:         cat = 'restaurant' return cat 

While this does work, its extremely painstaking, and I have to write a separate if statement for each and every keyword. There has to be a better way to do this. Is there a library for something like this? Even just a way I could add a bunch of keywords to a list, and then use the list in the if statement would be amazing.

I’m not worried about speed/efficiency, as there isn’t an insane amount of data (a few thousand entries). I’m using python 3. I am very open to any learning experience, I am trying to learn more about this kind of stuff. Any suggestions are extremely welcome and appreciated. Thanks!

Add Comment
3 Answer(s)

While this is still slightly tedious, it’s less tedious than your solution. I would use a dictionary to assign each keyword to a specific group. I would write it like this:

def getCategory(description):     my_dict = {'AMZN': 'shopping', 'ARBYS': 'restaurant'}     for i in my_dict:         if i in description:             return my_dict[i]     return None #Return none of none of the keywords are in the description 
Answered on July 16, 2020.
Add Comment

I have to write a separate if statement for each and every keyword. There has to be a better way to do this.

You can use a dictionary to store mapping of keywords to categories, and iterate the dict to find a match.

categories_dict = {"AMZN": "shopping", "ARBYS": "restaurant"}  def get_category(description):   for key in categories_dict:     if key in description:       return categories_dict.get(key)   return None 
Add Comment

Using the linked answer, here is some sample code that may be helpful: https://stackoverflow.com/a/33406474/13124888 (reference).

Before diving into the code, I would highly recommend looking at re (which stands for regular expressions), which is a powerful library in native Python that you can use for finding keywords, swapping out text patterns, etc. You can documentation for this library here: https://docs.python.org/3/library/re.html.

Please also see the code snippet below, which is based off of the code in the linked post:

import re  matches_list = ['AMZN', 'ARBYS', ... ]  # Keywords list matches_to_category = {'AMZN': 'shopping', 'ARBYS': 'restuarant', ...}  # keyword --> type dict   def match(input_string, string_list):     cat = []  # Initialize     words = re.findall(r'\w+', input_string)     keywords = set([word for word in words if word in string_list])     for keyword in keywords:  # Iterate over words found for a line         cat.append(matches_to_category[keyword])  # Add category to keyword     return cat  >>> sentence = "AMZN is great for shopping; ARBYS has the meats!" >>> match(sentence, matches_list) ['shopping', 'restuarant'] 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.