How to use structural pattern matching (match-case) with regex?

1k views Asked by At

What would be a "pythonic" way of expressing regex'es with Python 3.10's (and later) match-case statements (i.e, pattern matching)? It intuitively sounds like a good use case, but I cannot figure out a "clean" way of expressing something like:

s = "[email protected]"
match s:
    # cases matches s using regex
2

There are 2 answers

0
racemaniac On

The following works

examples = ["[email protected]", "https://www.web-site.com",  "Doe, John", "junk", r"\\computer\folder\file", "smtp://mailserver.acme.com"]

for text in examples:

    match tuple(
            i for i, e in 
                enumerate(
                    [   
                        r"[A-Za-z0-9\.\-]+@[A-Za-z0-9\.\-]+$", # e-mail address pattern => 0 
                        r"http[s]{0,1}://[A-Za-z0-9\/\.\-]*$", # url pattern => 1 
                        r"[A-Za-z]+, [A-Za-z]+$" # name pattern => 2
                    ]
                ) 
                if re.match(e, text) is not None
            ):
        
        case (0,): print(f"'{text}' is an e-mail address")

        case (1,): print(f"'{text}' is a url")
        
        case (2,): print(f"'{text}' is a name")

        case (0, 1) | (0, 2) | (1,2) | (0, 1, 2): print(f"'{text}' is impossible, because the patterns are mutually exclusive")
        
        case _ : print(f"'{text}' is unrecognizable")

and results in

'[email protected]' is an e-mail address
'https://www.web-site.com' is a url
'Doe, John' is a name
'junk' is unrecognizable
'\\computer\folder\file' is unrecognizable
'smtp://mailserver.acme.com' is unrecognizable
0
Greg klupar On

I like to use the walrus operator and match against one of the Regex Match() object's methods.

m.groupdict(), m.groups(), or m.regs

if m := re.match(r"(?P<mailbox>.+?)@(?P<domain>.+)", "[email protected]"):
    match m.groupdict():
        case {'domain': "gmail.com"}:
            print(f"A google email address was found")
        case {'domain': domain}:
            print(f"domain was bound to {repr(domain)}")
else:
    pass # No match

It would also be possible to match against m.groups() to compare with