I'm writing a parser for a specific file format using FParsec as a firstish foaray into learning fsharp. Part of the file has the following format
{ 123 456 789 333 }
Where the numbers in the brackets are pairs of values and there can be an arbitrary number of spaces to separate them. So these would also be valid things to parse:
{ 22 456 7 333 }
And of course the content of the brackets might be empty, i.e. {}
In addition I want the parser to be able to handle the case where the content is a bit malformed, eg. { some descriptive text } or maybe more likely { 12 3 4} (invalid since the 4 wouldn't be paired with anything). In this case I just want the contents saved to be processed separately.
I have this so far:
type DimNummer = int
type ObjektNummer = int
type DimObjektPair = DimNummer * ObjektNummer
type ObjektListResult = Result<DimObjektPair list, string>
let sieObjektLista =
let pnum = numberLiteral NumberLiteralOptions.None "dimOrObj"
let ws = spaces
let pobj = pnum .>> ws |>> fun x ->
let on: ObjektNummer = int x.String
on
let pdim = pnum |>> fun x ->
let dim: DimNummer = int x.String
dim
let pdimObj = (pdim .>> spaces1) .>>. pobj |>> DimObjektPair
let toObjektLista(objList:list<DimObjektPair>) =
let res: ObjektListResult = Result.Ok objList
res
let pdimObjs = sepBy pdimObj spaces1
let validList = pdimObjs |>> toObjektLista
let toInvalid(str:string) =
let res: ObjektListResult =
match str.Trim(' ') with
| "" -> Result.Ok []
| _ -> Result.Error str
res
let invalidList = manyChars anyChar |>> toInvalid
let pres = between (pchar '{') (pchar '}') (ws >>. (validList <|> invalidList) .>> ws)
pres
let parseSieObjektLista = run sieObjektLista
However running this on a valid sample I get an error:
{ 53735 7785 86231 36732 }
^
Expecting: whitespace or '}'
You're trying to consume too many spaces.
Look:
pdimObjis apdim, followed by some spaces, followed bypobj, which is itself apnumfollowed by some spaces. So if you look at the first part of the input:One can clearly see from here that
pdimObjconsumes everything up to86231, including the space just before it. And therefore, whensepByinsidepdimObjslooks for the next separator (which isspaces1), it can't find any. So it fails.The smallest way to fix this is to make
pdimObjsusemanyinstead ofsepBy: sincepobjalready consumes trailing spaces, there is no need to also consume them insepBy:But a cleaner way, in my opinion, would be to remove
wsfrompobj, because, intuitively, trailing spaces aren't part of the number representing your object (whatever that is), and instead handle possible trailing spaces inpdimObjsviasepEndBy: