Trying to make a lexer using Beautiful Racket

142 views Asked by At

I am new to Racket and I am trying to tokenize a grammar using the Beautiful Racket library. I have defined the grammar in a separate file and it seems to be completely fine. I have also created a parser that uses the 'parse-to-datum' procedure in Beautiful Racket, which is also working fine. However, I am encountering an error with my tokenizer. As the parser encounters an ID such as 'A', it produces an error message:

Encountered unexpected token of type 'ID (value "A") while parsing 'unknown [line=1, column=#f, offset=9]

I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:

10 read A 
20 read B 
30 gosub 400 
40 if C = 400 then write C 
50 if C = 0 then goto 1000 
400 C = A + B : return 
$$

here is my grammar:

program -> linelist $$ 
linelist -> line linelist | epsilon 
line -> idx stmt linetail* [EOL]
idx -> nonzero_digit digit* 
linetail -> :stmt | epsilon 
stmt -> id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
expr -> id etail | num etail | (expr)
etail -> + expr | - expr | = expr | epsilon
id -> [a-zA-Z]+
num -> numsign digit digit*
numsign -> + | - | epsilon 
nonzero_digit -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit -> 0 | nonzero_digit

Here is my tokenizer:

#lang br/quicklang
(require brag/support)

(define (make-tokenizer port)
  (port-count-lines! port) ; get line data
  (define (next-token)
  
    (define-lex-abbrevs
      (lower-letter (:/ "a" "z"))
      (upper-letter (:/ #\A #\Z))
      (digit (:/ "0" "9")))
      
    (define odai-lexer
      (lexer
       
       [whitespace (token 'WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
       
       ["{" (token 'PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["}" (token 'PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["$" (token 'DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["read" (token 'READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["write" (token 'WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [";" (token 'DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["if" (token 'IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["then" (token 'THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["=" (token 'ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["+" (token 'ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["-" (token 'SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["(" (token 'OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [")" (token 'CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["goto" (token 'GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["gosub" (token 'GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["return" (token 'RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       [(:+ (:or lower-letter upper-letter)) (token 'ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [(:+ digit) (token 'DIGIT (string->number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
      
       
       [any-char (token 'MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
    (odai-lexer port))
  next-token)

(provide make-tokenizer)

I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:

id : LETTER+

1

There are 1 answers

0
soegaard On

The lexer produces an ID token, so try this rule in the parser:

   id -> ID