I wanted to write a parser based on John Hughes' paper Generalizing Monads to Arrows. When reading through and trying to reimplement his code I realized there were some things that didn't quite make sense. In one section he lays out a parser implementation based on Swierstra and Duponchel's paper Deterministic, error-correcting combinator parsers using Arrows. The parser type he describes looks like this:
data StaticParser ch = SP Bool [ch]
data DynamicParser ch a b = DP (a, [ch]) -> (b, [ch])
data Parser ch a b = P (StaticParser ch) (DynamicParser ch a b)
with the composition operator looking something like this:
(.) :: Parser ch b c -> Parser ch a b -> Parser ch a c
P (SP e2 st2) (DP f2) . P (SP e1 st1) (DP f1) =
P (SP (e1 && e2) (st1 `union` if e1 then st2 else []))
(DP $ f2 . f1)
The issue is that the composition of parsers q . p 'forgets' q's starting symbols. One possible interpretation I thought of is that Hughes' expects all our DynamicParsers to be total such that a symbol parser's type signature would be symbol :: ch -> Parser ch a (Maybe ch) instead of symbol :: ch -> Parser ch a ch. This still seems awkward though since we have to duplicate information putting starting symbol information in both the StaticParser and DynamicParser. Another issue is that almost all parsers will have the potential to throw which means we will have to spend a lot of time inside Maybe or Either creating what is essentially the "monads do not compose problem." This could be remedied by rewriting DynamicParser itself to handle failure or as an Arrow transformer, but this is straying quite a bit from the paper. None of these issues are addressed in the paper, and the Parser is presented as if it obviously works, so I feel like I must me missing something basic. If someone can catch what I missed that would be super helpful.
I think the deterministic parsers described by Swierstra and Duponcheel are a bit different from traditional parsers: they do not handle failure at all, only choice.
See also the
invokeDetfunction in the S&D paper:This function clearly assumes it will always be able to find a valid parse.
With the arrow version of the parsers described by Hughes you can write a examples like this:
Which will print the expected:
However, if you write a "failing" parse:
It will still print:
To make this behavior a bit more sensible, Swierstra and Duponcheel also introduce error-correction. The output
'c'is expected if we assume the erroneous characterdhas been corrected to be acin the input. This requires an extra mechanism which presumably was too complicated to include in Hughes' paper.I have uploaded the implementation I used to get these results here: https://gist.github.com/noughtmare/eced4441332784cc8212e9c0adb68b35
For more information about a more practical parser in the same style (but no longer deterministic and no longer limited to LL(1)) I really like the "Combinator Parsing: A Short Tutorial" by Swierstra. An interesting excerpt from section 9.3:
The full parser implementation by Swierstra can be found in the
uu-parsinglibpackage, although I do not know how many of the extensions are implemented there.