Scala: Regular Expression pattern match with curly braces?

1.4k views Asked by At

so I am creating an WML like language for my assignment and as a first step, I am supposed to create regular expressions to recognize the following:

//single = "{"
//double = "{{"
//triple = "{{{"

here is my code for the second one:

val double = "\\{\\{\\b".r

and my Test is:

println(double.findAllIn("{{ s{{ { {{{ {{ {{x").toArray.mkString(" "))

Bit it doesn't print anything ! It's supposed to print the first, second, fifth and 6th token. I have tried every single combination of \b and \B and even \{{2,2} instead of \{\{ but it's still not working. Any help??

As a side question, If I wanted it to match just the first and fifth tokens, what would I need to do?

1

There are 1 answers

2
dk14 On BEST ANSWER

I tested your code (Scala 2.12.2 REPL), and in contrary to your "it doesn't print anything" statement, it actually prints "{{" occurrence from "{{x" substring.

This is because x is a word character and \b matches a position between second { and x. Keep in mind that { isn't a word character, unlike x.

As per this tutorial

It matches at a position that is called a "word boundary". This match is zero-length

There are three different positions that qualify as word boundaries:

1) Before the first character in the string, if the first character is a word character

...

As for solution, it depends on precise definition, but lookarounds seemed to work for me:

"(?<!\\{)\\{{2}(?!\\{)".r

It matched "first, second, fifth and 6th token". The expression says match "{{" not preceded and not followed by "{".

For side-question:

"(?<![^ ])\\{\\{(?![^ ])".r //match `{` surrounded by spaces or line boundaries

Or, depending on your interpretation of "space":

"(?<!\\S)\\{\\{(?!\\S)".r

matched 1st and 5th tokens. I couldn't use positive lookarounds coz I wanted to take line beginnings and endings (boundaries) into account automatically. So double negation by ! and [^ ] created an effect of implicit inclusion of ^ and $. Alternatively, you could use:

"(?<=^|\\s)\\{\\{(?=\\s|$)".r

You can read about lookarounds here. Basically they match the symbol or expression as boundary; simply saying they match stuff but don't include it in the matched string itself.

Some examples of lookarounds

  • (?<=z)aaa matches "aaa" that is preceded by z
  • (?<!z)aaa matches "aaa" that is not preceded by z
  • aaa(?=z) matches "aaa" followed by z
  • aaa(?!z) matches "aaa" not followed by z

P.S. Just to make your life easier, Scala has """ for escaping, so let's say instead of:

"(?<!\\S)\\{\\{(?!\\S)".r

you can just:

"""(?<!\S)\{\{(?!\S)""".r