Hebrew regex does not match similar titles

73 views Asked by At

I have the following regex:

    r'\d.\s(.*)\sשנה.*.docx'

That works for titles like 1. בראשית שנה א_ - הבדלה על קפה.docx

but is not working for 10. מקץ שנה א_ - פדיון שבויים.docx

Any help havre?

1

There are 1 answers

0
The fourth bird On

You are missing a digit in 10. because \d.\s matches a single digit, then a single any character except a newline and then a whitespace character.

If you want to keep the capture group, you could make the dot star non greedy, escape the dots to match them literally and use word boundaries \b to prevent partial word matches:

\b\d+\.\s+(.*?)\s+שנה.*?\.docx\b

Regex demo