JFlex RE matching lookahead and precedence

48 views Asked by At

For XPath parsing, I need to distinguish the token type of "name" in the cases where it is followed by "::" (possibly after whitespace), followed by "(" (possibly after whitespace), or followed by neither.

In JLex we did this with a routine that read ahead in the yy_* buffer, but that isn't exposed in JFlex, and a lookahead RE should be a cleaner solution than handcoded lookahead.

Unfortunately, my first attempt isn't working as expected; it's falling into the third category (standalone name) more often than I would have expected.

What I'm trying is the following pattern, where "self" is the thing I'm trying to separate between the two cases. Please pardon the "GONK_" -- that's my convention for debugging wrappers. And the wrappered newSymbol() is itself a wrapper for new Symbol() which has a side effect; sorry about that.

"self/\s*::"       { return GONK_newSymbol(sym.SELF); }
"self/\s*[(]"      { return GONK_newSymbol(sym.SELF); }
"self"             { return GONK_newSymbol(sym.QNAME,yytext()); }

As I understand it, JFlex's RE rules are "longest match wins, ties broken in favor of first match", so I expected the lookaheads to take precedence. But putting a trace printout in the GONK_ functions tells me that the third case (sym.QNAME) is almost always being taken.

I'm sure my error is obvious and I'm looking right at it, but... What did I miss?

1

There are 1 answers

0
keshlam On

Found it. Shouldn't have quoted the RE syntax except literals (to emphasize that they are literals).

"child"/\s*"::"    { return GONK_newSymbol(sym.CHILD); }
"child"/\s*[(]     { return GONK_newSymbol(sym.CHILD); }
"child"            { return GONK_newSymbol(sym.QNAME,yytext()); }

Knew I was looking right at it.

Leaving the question here because I didn't find a good example of using JFlex lookahead elsewhere.