I'm trying to build a Smalltalk REPL based on LanguageKit, which uses a lemon gramma. Currently the parser only supports parsing complete class definitions, but not statements outside of the method syntax.
For example this gets parsed:
methodName [
NSObject description.
NSObject debugDescription.
]
But it will fail if I try to parse only the statements:
NSObject description.
NSObject debugDescription.
The following won't accept multiple statements (e.g. Transcript show: 'hello'. Transcript show: 'world'.):
file ::= statement_list(M).
{
[p setAST:M];
}
Here the minimal gramma:
%include {
#include <assert.h>
#import <Foundation/Foundation.h>
#import <LanguageKit/LanguageKit.h>
#import <LanguageKit/LKToken.h>
#import "SmalltalkParser.h"
}
%name SmalltalkParse
%token_prefix TOKEN_
%token_type {id}
%extra_argument {SmalltalkParser *p}
%left PLUS MINUS STAR SLASH EQ LT GT COMMA.
%left WORD.
file ::= method(M).
{
[p setAST:M];
}
file ::= statement_list(M).
{
[p setAST:M];
}
file ::= statement(M).
{
[p setAST:M];
}
file ::= .
method(M) ::= signature(S) LSQBRACK statement_list(E) RSQBRACK.
{
M = [LKInstanceMethod methodWithSignature:S locals:nil statements:E];
}
signature(S) ::= WORD(M).
{
S = [LKMessageSend messageWithSelectorName:M];
}
signature(S) ::= keyword_signature(M).
{
S = M;
}
statement_list(L) ::= statements(T).
{
L = T;
}
statement_list(L) ::= statements(T) statement(S).
{
[T addObject:S];
L = T;
}
statements(L) ::= statements(T) statement(S) STOP.
{
[T addObject:S];
L = T;
}
statements(L) ::= .
{
L = [NSMutableArray array];
}
statement(S) ::= expression(E).
{
S = E;
}
%syntax_error
{
[NSException raise:@"ParserError" format:@"Parsing failed"];
}
message(M) ::= simple_message(S).
{
M = S;
}
simple_message(M) ::= WORD(S).
{
M = [LKMessageSend messageWithSelectorName:S];
}
expression(E) ::= simple_expression(S).
{
E = S;
}
simple_expression(E) ::= WORD(T) simple_message(M).
{
[M setTarget:T];
E = M;
}
The complete gramma can be found here: smalltalk.y. I have been reading other grammas and also searching stackoverflow, but don't see a difference for example with this gramma and don't understand why this isn't working.
Your grammar has parsing conflicts. These must be resolved if you have a hope of getting the grammar to work correctly.
(The grammar also has an undefined non-terminal,
keyword_signature, and an unused non-terminalmessage. To get it to compile without warnings, I simply removed them. I don't think it makes any difference to the analysis below.)Part of the conflict is very simple: you cannot have both
and
In fact, it is not clear to me why you would want to? Isn't a
statementan example of astatement_list?The reason you cannot have both is that you have:
and
Taken together, that means that starting with
statement_list, you can recognise a singlestatement. So your grammar is ambiguous; if the input is a single statement, it could be directly parsed as afileor it could be parsed asfile⇒statement_list⇒statements statement⇒statement, with a different set of actions.You might not care; indeed, you might believe that the sequence of actions is identical. You might even be right about that. But the parser cannot know that, and it will not believe it. It sees the two parses as necessarily distinct. So it will report a conflict.
In short, get rid of
file ::= statement .. Then you can start working on the other parsing conflicts.The more fundamental problem is also based on the fact that
statementscan derive the empty sequence.Let's take a look at the grammar (simplified by removing all semantics):
If
statement_listis not empty, whatever it matches must start with an emptystatementsfollowed by astatement.statement, in turn, must start withWORD, sostatement_listmust match input which starts withWORD. But before it can shift theWORDin order to continue the parse, it needs to first insert the emptystatements. So it needs to do a reduction using the last rule quoted above before it can handle aWORD. (If this paragraph is not completely clear, please try rereading it, and if you still have questions, ask. It's important to understand this part.)None of that would be a problem if it were not for the fact that
filecould also be amethod, andmethodalso starts with aWORD. But, unlikestatement_list, it really starts withWORD. It does not start with an emptystatements, so if the parser creates an emptystatementsand the input is actually amethod, then the parse will fail.As it happens, this particular conflict doesn't occur if you have
file ::= statementinstead offile ::= statement_list, becausestatementdoesn't start with an emptystatementseither. That means that when the parser sees theWORDat the beginning of the input, it does not yet have to decide whether it is about to see astatementor amethod. In both cases, the parse action is to shift theWORDand see what comes next.To solve this problem, we can observe that
statement_listmust contain at least onestatement, and that all thestatements inside astatement_list(except possibly the last one) must be terminated with aSTOP(that is, .). If we start with that idea, it's pretty easy to produce an alternative grammar which does not require an empty list at the beginning:This differs from your grammar in that it considers a
statement_listto be a non-empty list of dot-separatedstatements optionally terminated with a dot, while your grammar considers a statement_list to be a possibly empty list of dot-terminatedstatements followed by a singlestatement.Since I have now tested the grammar, I add the complete testable code as an illustration of what we ask for when we ask for a Minimal Complete Verifiable Example. (I used C and flex rather than Objective C but I don't think that makes any difference.)
File parser.y:
File main.l:
Build procedure:
Tests: