Using regular expression to parse email data

89 views Asked by At

I want to parse through my email inbox and find marketing emails with coupon codes in them extract the code from them the logic I have written works on only singular type of data.

def extract_promo_code(body):
    # Use regular expressions to find promo code
    promo_code_pattern = r'(?i)(?:Enter\s+Code|Enter\s+promo)(?:[\s\n]*)([A-Z0-9]+)'
    match = re.search(promo_code_pattern, body)
    if match:
        promo_code = match.group(1)
        # Remove any non-alphanumeric characters from the promo code
        promo_code = re.sub(r'[^A-Z0-9]', '', promo_code)
        return promo_code
    else:
        return None

Following are a couple of samples from which I want to extract coupon code:

  1. "Enter code at checkout.* Offer valid until October 6, 2023, 11:59pm CT MKEA15EMYZGP8W"

  2. "Enter code JSB20GR335F4 Ends September 21, 2023, at 11:59pm CT.*"

I want the code to catch the first promo code the comes after the text "Enter Code" or "enter promo" which consists a mix of digits and uppercase letters even if there are line breaks and spaces between text and promo code.

The above code runs fine for sample 2 but doesn't catch the code in sample 1.

1

There are 1 answers

0
Andrej Kesely On BEST ANSWER

You can use (you can adjust the pattern, I used that the promo-code has at minimum 10 characters) (regex101 demo):

import re

text = """\
Enter code at checkout.* 
Offer valid until October 6, 2023, 11:59pm CT MKEA15EMYZGP8W

Enter code JSB20GR335F4 Ends September 21, 2023, at 11:59pm CT.*
"""

pat = r"""(?s)Enter (?:code|promo).*?\b([A-Z\d]{10,})"""

for code in re.findall(pat, text):
    print(code)

Prints:

MKEA15EMYZGP8W
JSB20GR335F4