Why is my simple SQL grammar failing to parse in Brag?

170 views Asked by At

I am trying to create a parser for a simple subset of SQL using a grammar written with BNF in Brag. My Brag code looks like this:

#lang brag
statement : "select" fields "from" source joins* filters*
fields    : field ("," field)*
field     : WORD
source    : WORD
joins     : join* 
join      : "join" source "on" "(" condition ")"
filters   : "where" condition ("and" | "or" condition)*
condition : field "=" field

But when I attempt to use that grammar to parse a basic SQL statement, I run into the following error:

> (parse-to-datum "select * from table")
Encountered unexpected token of type "s" (value "s") while parsing 'unknown [line=#f, column=#f, offset=#f]

I'm a total beginner to grammars and brag. Any ideas what I'm doing incorrectly?

1

There are 1 answers

1
Sorawee Porncharoenwase On BEST ANSWER

You need to lex/tokenize the string first. The input to parse/parse-to-datum should be a list of tokens. Also, brag is case sensitive, meaning that the input should be select rather than SELECT. After you do that, it should work:

> (parse-to-datum (list "select" 
                        (token 'WORD "foo") 
                        "from " 
                        (token 'WORD "bar") 
                        " " 
                        " "))
'(statement "select" (fields (field "foo")) "from " (source "bar") " " " ")

For the case sensitivity issue, this is fact not a problem, as you can perform normalization during the tokenization phase.

Your grammar looks weird, however. You probably should not deal with whitespaces. Instead, the whitespace should similarly be dealt with in the tokenization phase.

See https://beautifulracket.com/bf/the-tokenizer-and-reader.html for more information about tokenization.


An alternative possibility is to use other parsers. https://docs.racket-lang.org/megaparsack/index.html, for instance, can parse a string to a datum (or syntax datum) right away, though it uses some advanced concept in functional programming, so in a way it might be more difficult to use.