Design and implementation of lexical analyzer compiler theory of

First, the program requirements (to python for example).

 

Lexical analysis program ( , Lexical  Analyzer ) requirements:

- stream from a source program composed of character scan left to right

- identify the lexical meaning of the word ( Lexemes )

- Return word record (word class, the word itself)

- filtered spaces

- skip comments

- lexical errors found

 

2. Program structure:

Input: character stream (input what way, what data structure stored)

deal with:

- Traverse (What traversal)

- lexical rules

Output: word stream (what output form)

- tuple

 

3. Type the word:

1. Identifier (10)

2. unsigned (11)

3. Leave the word (the word one yard)

4. Operator (word one yard)

5. delimiter (word one yard)

 

Word symbols

Species do not code

Word symbols

Species do not code

begin

1

:

17

if

2

:=

18

then

3

<

20

while

4

<=

21

do

5

<>

22

end

6

>

23

l(l|d)*

10

>=

24

dd*

11

=

25

+

13

;

26

-

14

(

27

*

15

)

28

/

16

#

0

 

Second, the code implements (in python example).

1. lexical analysis program.

 

 1 import re
 2 
 3 
 4 strs = "if sum >= 1000 then x : x - 1;#"+" "
 5 
 6 types = {'begin':1, 
 7          'if':2,
 8          'then':3,
 9          'while':4,
10          'do':5,
11          'end':6,
12          'l(l|d)*':10,
13          'dd*':11,
14          '+':13,
15          '-':14,
16          '*':15,
17          '/':16,
18          ':':17,
19          ':=':18,
20          '<':20,
21          '<=':21,
22          '<>':22,
23          '>':23,
24          '>=':24,
25          '=':25,
26          ';':26,
27          '(':27,
28          ')':28,
29          '#':0
30         }
31 
32 if __name__ == '__main__':
33     # strs = input('请输入程序代码:')+" " #补位
34     
35     index = 0
36     while index < len(strs):
37         keyIndex = 0
38         for key in types.keys():
39             if index+len(key) < len(strs):
40                 if strs[index:index+len(key)] == key and not re.match('^[=a-zA-Z0-9_-]$', strs[index+len(key)]):
41                     if not(strs[index] == '=' and re.match('^[<>]$', strs[index-1])):
42                         ss = strs[index:index+len(key)]
43                         print((ss, types.get(ss)))
44                 elif re.match('^[a-zA-Z0-9_]+', strs[index:]):
45                     ss = re.match('^([a-zA-Z0-9_]+)', strs[index:]).group()
46                     if not types.get(ss):
47                         if re.match('[a-zA-Z]+', ss):
48                             print((ss, '标识符'))
49                         elif re.match('\d+', ss):
50                             print((ss, '数字'))
51                         else:
52                             print((ss, '其他'))
53                     index += len(ss)
54             keyIndex+=1
55         index+=1

 

 

 

  2.运行结果展示。

  

 

Guess you like

Origin www.cnblogs.com/Rakers1024/p/11640718.html