From zero to write a compiler (2): The pre-knowledge of the syntax analysis

Foreword

After completing before lexical analysis, it has been Token stream, then the next step is to implement parser to get the input stream Token abstract syntax tree (the Abstract Syntax Tree, AST) . But unlike in the completion of this parser lexical analyzer, direct hand line and good, still need some pre-knowledge.

These pre-knowledge before Bowen has ever mentioned

Before Bowen directory

The complete code for the project in C2j-Compiler

What is parsing?

If we lexical analysis as a combination of words, the output word stream, parsing can be seen as check these words are not in line with the process of grammar. When using regular lexical analysis or by manual alignment to validate the word, the syntax analysis is context-free grammar (context-Free Grammar, of CFG) .

If a formal grammar G = (N, Σ, P , S) of production rules take the following form: V -> w, that of the. Which V∈N, w ∈ (N∪Σ) . Context-free grammar named reasons "context-free" is because the character string w V can always be replaced free, regardless of the context of the V character appears. In the form of a context-free language, it is irrelevant if the grammar generated by the context *

Form (BNF)

Backus-Naur Form (English: Backus Normal Form, BNF) is a context-free grammar for representing language.

Look at an example:

S –> AB
A –> aA | ε
B –> b | bB

Wherein SAB called nonterminal representative may be produced by deriving a new symbol, which is also nonterminal previously defined in the Token class; ab ε called terminal symbols , which no longer represents a new symbol is generated by derivation, [epsilon] It said air;

Each row is above a production rule, also called derivation, represents the transfer way nonterminals;

S is the start symbol.

Only the terminal symbol is called a symbol string sentence (sentence) .

For example, through these three productions, it can be concluded bbb grammatical rules.

Several methods of parsing

And speaking the same as before, is divided into top-down and bottom-up two kinds of

Before recording in learning a little bit of this in several ways, here is not to say

Recursive descent and LL (1) parsing
From bottom-up parsing

Here to tell you a little of this parsing method used, LALR (1), it belongs to the bottom-up analysis algorithms.

Syntax analysis bottom-up

A bottom-up parsing procedure corresponds to the process of parsing an input string configured book, it starts from the leaf node to reach the root node and reduce progressively upwardly through the shift operation

Bottom-up parsing need to store a stack of symbol resolution, for example, the following syntax:

0.  statement -> expr
1.  expr -> expr + factor
2.           | factor
3.  factor ->  ( expr )
4.           | NUM

2 + 1 to resolve

stack input
null 1 + 2
ON ONE + 2 Start reading a character, and resolved into the corresponding token stack, called shift operation
factor + 2 The grammar derivation, factor -> NUM, the NUM the stack, the stack factor, this operation is called reduce
expr + 2 Reduce operating here continue to do, but because the syntax derivation has two productions, so it is necessary to look ahead in order to comply with a judgment is shift or reduce, that is, parsing of LA
expr + 2 shift operation
NUM expression + null shift operation
expr + factor null The fator performed reduce the production
expr null reduce operating
statement null reduce operating

At this time reduced to the start symbol, and the input string is also empty, the representative successful parsing

So realize bottom-up parsing the key is to identify the stack should be on shift or reduce operations.

  • Violent matching symbols on the stack and search all of derivation grammar match x
  • A state machine configured to determine the state of the stack after the press-fitting operation or pop whether reduce

So the next task is to build a natural finite-state automaton to be able to guide the parser to operate.

summary

In fact, the so-called pre-knowledge is to understand parsing doing, and probably how to do it.

Token syntax analysis is to check the input stream is not in line with the process of grammar, syntax analysis algorithms to complete this step, take it from the bottom up, that is derived from the leaf node to process up to the top of the tree.

Also my github blog: https://dejavudwh.cn/

Guess you like

Origin www.cnblogs.com/secoding/p/11367521.html