I need to parse something like (yaml):
- from: src
to:
- target1
- target2
- from: src2
to:
- target3
- target4
I tried something like (simplified pseudo)
identifierRule = +alnum;
fromToRule = lit("-") >>
(
"from:" >> identifierRule >> qi::eol >>
(
("to: " >> qi::eol >> +(qi::repeat(indention)[qi::blank] >> "-" >> identifierRule >> qi::eol))
);
But with this approach the second 'from' entry is parsed as an additional entry of the first 'to' entries and not as a new seperate entry. Is there any way to retrieve the current indention level and use this as an additional rule information?
Of course you should be using a YAML library (e.g. yaml-cpp), because YAML is much more versatile and ... riddled with parser idiosyncrasies. Don't roll your own.
However, assuming you're trying to learn Spirit Qi, there's merit to the question.
It's not at all trivial though, and a lot of it depends on what you want to be parsing into. Focusing only on the input shown, I'd imagine an AST like:
So,
"- a\n- b"would be a list,"a: b\nc: d"would be a dict and anything else is a raw value.To be able to nest, let's create rules that are parameterized by a level number:
Only
keyandrawvaluenever contain a newline, so don't need the parameter.linebreak_doesn't expose attributes, but is made a rule so we could enable debug output for it.Now, leaning on a lot of experience I might write the rules as follows:
First things first, so we can keep it "readable". Right away, some of the helpers:
We help ourselves with shorthands, so we don't have to repeat ourselves. However note the subtle presence of
qi::copy(which isproto::deep_copy, see e.g. Assigning parsers to auto variables).Now, we can have the rules pretty much "naively":
The vagueness going on here is the unspecified omission of blank space at the beginning of raw values. Now, let's continue top-down for
level-aware productions:We start with list, because it's most recognizable by it's
"- "prefix:Remember
nestedis just the Phoenix expression forlevel + 1.Dicts keep the same level for all entries.
Note we tolerate insignificant blank space around
:.Everything rolled together:
Adding minimal code to print the resulting AST: Live On Compiler Explorer
Prints
And with
BOOST_SPIRIT_DEBUGenabled: