Jonathan Andersson :
I have an Antlr4 grammar that ends up in a infinity loop when trying to parse an expression.
Running Antlr version 4.7 Java 1.8
The expression looks like this:
monkey=Å
But it works if the right variable is a string:
monkey="Å"
Or if it looks like this:
monkey=A
The last message Antlr prints before it gets stuck is:
line 1:5 mismatched input '' expecting {NUMBER, STRING, BOOLEAN, 'EMPTY', 'NULL'}
Sadly I'm not an expert at Antlr and I've tried to read up on it but can not figure this one out.
Here is my grammar file:
grammar MyObjectFilter;
/*
* Lexer rules
*/
fragment DIGIT : [0-9] ;
NUMBER : DIGIT+ ([.,] DIGIT+)?;
// Non-greedy String expression that also removes the quotes from the string
STRING : '"' ( '\\"' | . )*? '"' {setText(getText().substring(1, getText().length()-1));} ;
BOOLEAN : 'true' | 'false';
EMPTY : 'EMPTY';
NULL : 'NULL';
// Remove the $ sign from the start of the identifier
IDENTIFIER : [a-zA-Z][a-zA-Z0-9._-]* ;
VALUE : [0-9]*;
AND : '&&' ;
OR : '||' ;
NOT : '!' ;
NEQ : '!=' ;
GT : '>' ;
GE : '>=' ;
LT : '<' ;
LE : '<=' ;
EQ : '=' ;
LPAREN : '(' ;
RPAREN : ')' ;
WS : [ \r\t\u000C\n]+ -> skip;
/*
* Parser rules
*/
parse
: expression EOF
;
expression
: LPAREN expression RPAREN #parenExpression
| NOT expression #notExpression
| left=identifier op=comparator right=value #comparatorExpression
| left=expression op=binary right=expression #binaryExpression
;
identifier
: IDENTIFIER
;
value
: STRING | NUMBER | BOOLEAN | EMPTY | NULL
;
comparator
: GT | GE | LT | LE | EQ | NEQ
;
binary
: AND | OR
;
Initializing this with:
InputStream stream = new ByteArrayInputStream(definition.getBytes(StandardCharsets.UTF_8));
MyObjectFilterLexer lexer = new MyObjectFilterLexer(CharStreams.fromStream(stream, StandardCharsets.UTF_8));
MyObjectFilterParser parser = new WTObjectFilterParser(new CommonTokenStream(lexer));
//This is where it get stuck.
ExpressionContext expr = parser.expression();
My best guess is that it can not determine the EOF of the expression.
Bart Kiers :
There's a lexer rule that matches zero-width tokens (of which there are an infinite amount):
VALUE : [0-9]*;
The changing it into:
VALUE : [0-9]+;