Why does \b not interpreted as backslash in this regular expression

44 views Asked by At

This is my code:

import re
string = "hello\bworld"
reg = r"\bhello\b"
print(re.findall(reg, string))

output is:

   ['hello']

But this doesn't make sense to me.
In the official python documentaion \b is described as the worst collision between python's and regex's escape squences:

the worst collision between Python’s string literals and regular expression sequences. In Python’s string literals, \b is the backspace character, ASCII value 8. If you’re not using raw strings, then Python will convert the \b to a backspace, and your RE won’t match as you expect it to

In python \b is interpreted as a backspace character, so string will be translated into hellworld.

print(string)

output is:

hellworld

While in regular expression \b is interpreted as a word boundary
So searching the pattern r"\bhello\b" in hellworld should theoretically yield 0 matches
Since there isn't any hello surrounded by word boundaries in hellworld

I expected the result to be an empty list

Another way to look at is with this code:

import re
string = "hello\bworld"
reg = r"hellw"
print(re.findall(reg, string))

here the output is:

[]

Which again doesn't make sense to me since there is a hellw in string string
printing string proves it:

print(string)
hellw

What am I not seeing?

2

There are 2 answers

3
Barmar On BEST ANSWER

The backspace doesn't modify the contents of the string, it just affects how the string is printed on a terminal. The string contains o followed by backspace. Since o is a word character and backspace is a non-word character there's a word break between them, so \b in the regexp matches there.

0
Charles Duffy On

The translation from 'hello\bworld' to hellworld isn't done by Python itself -- instead, it's done by your terminal. The word hello is printed, then the cursor moves left a space when the backspace is printed, and the w is printed over where the o used to be.

Because the actual string contains an extra character (the backspace) between hello and world, a search for hellw isn't expected to match, and a search for hello as a distinct word (read: not surrounded by other word-valid characters at the front or end) is expected to match.