Python regex.findall not finding all matches of shorter length

135 views Asked by At

How can I find all matches that don't necessarily consume all characters with * and + modifiers?

import regex as re
matches = re.findall("^\d+", "123")
print(matches)
# actual output: ['123']
# desired output: ['1', '12', '123']

I need the matches to be anchored to the start of the string (hence the ^), but the + doesn't even seem to be considering shorter-length matches. I tried adding overlapped=True to the findall call, but that does not change the output.

Making the regex non-greedy (^\d+?) makes the output ['1'], overlapped=True or not. Why does it not want to keep searching further?

I could always make shorter substrings myself and check those with the regex, but that seems rather inefficient, and surely there must be a way for the regex to do this by itself.

s = "123"
matches = []
for length in range(len(s)+1):
    matches.extend(re.findall("^\d+", s[:length]))
print(matches)
# output: ['1', '12', '123']
# but clunky :(

Edit: the ^\d+ regex is just an example, but I need it to work for any possible regex. I should have stated this up front, my apologies.

4

There are 4 answers

5
The fourth bird On BEST ANSWER

You could use overlapped=True with the PyPi regex module and reverse searching (?r)

Then reverse the resulting list from re.findall

import regex as re

res = re.findall(r"(?r)^\d+", "123", overlapped=True)
res.reverse()
print(res)

Output

['1', '12', '123']

See a Python demo.

3
wim On

How about a positive lookbehind assertion:

>>> import regex as re
>>> re.findall(r'(?<=(^\d+))', '123')
['1', '12', '123']
0
Andrej Kesely On

I'd use standard library re:

import re

matches = re.findall("^\d+", "123")
out = [m[:i] for m in matches for i in range(1, len(m)+1)]
print(out)

Prints:

['1', '12', '123']
2
Niveditha S On
import re

m = re.findall(r'\d', '123')
op = ["".join(m[:i]) for i in range(1, len(m) + 1)]
print(op)

This is a bit better as re.findall() is called only once