Table of contents
The function of lexical analysis
The output form of the lexical analyzer
The structure of the lexical analyzer
Construction of State Transition Diagrams
The general form of a symbol table
Commonly used symbol table structure
Why Separate Lexical and Syntax Analysis
foreword
This review note is based on Mr. Zhang's classroom PPT, for my final review and reference for my classmates.
Start to enter the most important part, the following is the outline framework of the remaining knowledge
Highlights
Lexical Analysis Overview
The function of lexical analysis
Scan the character string of the source program, recognize the word symbol as output according to the lexical rules, and output the relevant error information (you can assign a line number to the error information) for the error found in the recognition process.
The relationship between lexical analyzers and syntax analyzers
① The lexical analyzer can be used as a separate link
② The lexical analyzer can be used as a subroutine of the grammatical analyzer
The output form of the lexical analyzer
kind of word
word output form
Binary
Classification of words: basic words (reserved words) are coded for each word; identifiers (alphanumeric strings beginning with letters) are listed as a single type; constants are classified by type (integer, real, Boolean, character...)
The structure of the lexical analyzer
- Input buffer: store source program
- Preprocessing procedures: cancel comments, propose useless blanks, tabulation, line feed, carriage return, etc.
- scan buffer (what the lexical analysis is really going to use): input a fixed-length string from the input buffer to another
- Buffer (scanning buffer), the lexical analyzer can directly perform symbol recognition in this buffer
Lexical analysis technology - advanced search : In order to determine the category of a word symbol, one or more units must be scanned
state transition diagram
Definition: a finite directed graph, circles represent nodes, represent states, and directed edges connect nodes, and marked characters on it represent characters that may be accepted or recognized in this state, with a unique initial state and several final states.
The status with * indicates that if the last recognized character is not in the word list, a character needs to be returned
Recognize word symbols with a state transition diagram:
1) Start from the initial state;2) Read a character from the input string;3) Identify the character read and which one starts from the current statematch the token on the arc, go to the corresponding matchingThe state pointed to by the arc;4) Repeat 3), and fail when none match; a word symbol is recognized when the final state is reached.
- How to distinguish basic words/reserved words that conform to identifiers?
- Reserve reserved words in the symbol table and indicate that they are not identifiers. Create separate state transition diagrams for reserved words
Construction of State Transition Diagrams
Lexical Analyzer Design
basic structure
content
- word
- Word list
- state transition diagram
- matching algorithm
Symbol table
Purpose
In the program, the user defines many names with identifiers to represent different data objects, and the compiler can save these names in the symbol table .
composition
In addition to recording the name itself , the symbol table also records various attribute information associated with the name .
role in lexical analysis
- Create symbol table, check and fill symbol table
- fills the symbol table with properties of unique identifiers, numeric constants, and character constants
- Write the entry address of the variable/constant in the symbol table to its own word (token)
The general form of a symbol table
Each name corresponds to an entry, and an entry includes a name field and an information field
The information field has several subfields and flags, and the content is related to the name
Commonly used symbol table structure
linear table
Use N arrays to store N subfields of the symbol table
HASH table/hash table
Summary and Supplement
Why Separate Lexical and Syntax Analysis
- Simplify the design of the compiler
- Improve compiler efficiency
- Enhanced compiler portability
chapter summary