I am a beginner to regex in c++ I was wondering why this code:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string s = "? 8==2 : true ! false";
boost::regex re("\\?\\s+(.*)\\s*:\\s*(.*)\\s*\\!\\s*(.*)");
boost::sregex_token_iterator p(s.begin(), s.end(), re, -1); // sequence and that reg exp
boost::sregex_token_iterator end; // Create an end-of-reg-exp
// marker
while (p != end)
std::cout << *p++ << '\n';
}
Prints a empty string. I put the regex in regexTester and it matches the string correctly but here when I try to iterate over the matches it returns nothing.
I think the tokenizer is actually meant to split text by some delimiter, and the delimiter is not included. Compare with
std::regex_token_iterator:Indeed you invoke exactly this mode as per the docs:
(emphasis mine).
So, just fix that:
Other Observations
All the greedy Kleene-stars are recipe for trouble. You won't ever find a second match, because the first one's
.*at the end will by definition gobble up all remaining input.Instead, make them non-greedy (
.*?) and or much more precise (like isolating some character set, or mandating non-space characters?).Live Demo
Prints
BONUS: Parser Expressions
Instead of abusing regexen to do parsing, you could generate a parser, e.g. using Boost Spirit:
Live On Coliru
Prints
This is much more extensible, will easily support recursive grammars and will be able to synthesize a typed representation of your syntax tree, instead of just leaving you with scattered bits of string.