This is my code:
import re
string = "hello\bworld"
reg = r"\bhello\b"
print(re.findall(reg, string))
output is:
['hello']
But this doesn't make sense to me.
In the official python documentaion \b is described as the worst collision between python's and regex's escape squences:
the worst collision between Python’s string literals and regular expression sequences. In Python’s string literals, \b is the backspace character, ASCII value 8. If you’re not using raw strings, then Python will convert the \b to a backspace, and your RE won’t match as you expect it to
In python \b is interpreted as a backspace character, so string will be translated into hellworld.
print(string)
output is:
hellworld
While in regular expression \b is interpreted as a word boundary
So searching the pattern r"\bhello\b" in hellworld should theoretically yield 0 matches
Since there isn't any hello surrounded by word boundaries in hellworld
I expected the result to be an empty list
Another way to look at is with this code:
import re
string = "hello\bworld"
reg = r"hellw"
print(re.findall(reg, string))
here the output is:
[]
Which again doesn't make sense to me since there is a hellw in string string
printing string proves it:
print(string)
hellw
What am I not seeing?
The backspace doesn't modify the contents of the string, it just affects how the string is printed on a terminal. The string contains
ofollowed by backspace. Sinceois a word character and backspace is a non-word character there's a word break between them, so\bin the regexp matches there.