From zero to write a compiler (VI): table-driven parser parsing it

The complete code for the project in C2j-Compiler

Foreword

Previous information has officially completed the building and sufficient judgment reduce the finite state automaton, the next task is to complete the parsing table based on the finite state automata and parsing is implemented in accordance with the table

reduce information

Prior to the completion of parsing table, there is still the last task that describe reduce information to guide the machine automatically if carried reduce operating

ProductionsStateNode reduce the information in each node is completed, a node traversal in the production, if the sign. "" at the end of the expression, then the node can be obtained based on the expression information and reduce lookAhead set corresponding to an expression

reduce information to indicate a map, key symbols can be performed to reduce, i.e. in line with lookahead sets, value is the production operation reduce

public HashMap<Integer, Integer> makeReduce() {
      HashMap<Integer, Integer> map = new HashMap<>();
      reduce(map, this.productions);
      reduce(map, this.mergedProduction);

      return map;
  }

  private void reduce(HashMap<Integer, Integer> map, ArrayList<Production> productions) {
      for (int i = 0; i < productions.size(); i++) {
          if (productions.get(i).canBeReduce()) {
              ArrayList<Integer> lookAhead = productions.get(i).getLookAheadSet();
              for (int j = 0; j < lookAhead.size(); j++) {
                  map.put(lookAhead.get(j), (productions.get(i).getProductionNum()));
              }
          }
      }
  }

Construction of parsing table

Build parsing tables, mainly in StateNodeManager class, you can ignore the logic loadTable and storageTableToFile of this part is mainly to store this table, can be used multiple times

From the main logic starts while traversing all the nodes, start jumping out Map information in relation to jump and jump destination node, then the corresponding (jump on the nature of the relationship is a numeral beginning Token enumeration reference) and copied to another destination node in the map. In the corresponding lookahead set before then reduce to get information, the symbol is found, the value thereof is rewritten to - (for production operation reduce the number), the reason for negative written, is to distinguish the shift operation.

Therefore HashMap <Integer, HashMap <Integer, Integer >> this data structure represents as an analysis table:

  1. The first number represents the current node Integer
  2. The second Integer represents the input character
  3. Integer The third, is greater than 0 if the shift operation is made less than 0 do reduce operation according derivations
public HashMap<Integer, HashMap<Integer, Integer>> getLrStateTable() {
      File table = new File("lrStateTable.sb");
      if (table.exists()) {
          return loadTable();
      }

      Iterator it;
      if (isTransitionTableCompressed) {
          it = compressedStateList.iterator();
      } else {
          it = stateList.iterator();
      }

      while (it.hasNext()) {
          ProductionsStateNode state = (ProductionsStateNode) it.next();
          HashMap<Integer, ProductionsStateNode> map = transitionMap.get(state);
          HashMap<Integer, Integer> jump = new HashMap<>();

          if (map != null) {
              for (Map.Entry<Integer, ProductionsStateNode> item : map.entrySet()) {
                  jump.put(item.getKey(), item.getValue().stateNum);
              }
          }

          HashMap<Integer, Integer> reduceMap = state.makeReduce();
          if (reduceMap.size() > 0) {
              for (Map.Entry<Integer, Integer> item : reduceMap.entrySet()) {

                  jump.put(item.getKey(), -(item.getValue()));
              }
          }

          lrStateTable.put(state.stateNum, jump);
      }

      storageTableToFile(lrStateTable);

      return lrStateTable;
  }

Table-driven parsing

The main process of parsing the LRStateTableParser class, initiated by the parse method.

The second and speaking as a required input stack, the stack node, the other thing now no need to use. Upon initialization first start node onto the stack, the current input character set EXT_DEF_LIST, then get the parsing table

public LRStateTableParser(Lexer lexer) {
    this.lexer = lexer;
    statusStack.push(0);
    valueStack.push(null);
    lexer.advance();
    lexerInput = Token.EXT_DEF_LIST.ordinal();
    lrStateTable = StateNodeManager.getInstance().getLrStateTable();
}

Parsing the steps of:

  • Get the current character under the current node and a corresponding operation, i.e. action> 0 is a shift operation, action <0 is reduce operation
  • If you enter the action> 0, that is, shift operations
    1. The current state of the node and the input characters are pushed onto the stack
    2. Here we must distinguish if the current character is a terminator, this time we can directly read the next character
    3. However, if there is a non-terminal symbol, the current character should be used to jump directly to the next state. One caveat here is a point where the current needs of the nonterminal, placed next to the corresponding input node in the stack, so it to reduce when the stack pop operation back symbol is correct
  • If the action> 0, that is, reduce operating
    1. To get the corresponding production
    2. The production state node corresponding to the right side of the stack pop
    3. The complete reduce this symbol in the input stack
public void parse() {
      while (true) {
          Integer action = getAction(statusStack.peek(), lexerInput);

          if (action == null) {
              ConsoleDebugColor.outlnPurple("Shift for input: " + Token.values()[lexerInput].toString());
              System.err.println("The input is denied");
              return;
          }

          if (action > 0) {
              statusStack.push(action);
              text = lexer.text;

              // if (lexerInput == Token.RELOP.ordinal()) {
              //     relOperatorText = text;
              // }

              parseStack.push(lexerInput);

              if (Token.isTerminal(lexerInput)) {
                  ConsoleDebugColor.outlnPurple("Shift for input: " + Token.values()[lexerInput].toString() + "   text: " + text);

                  // Object obj = takeActionForShift(lexerInput);

                  lexer.advance();
                  lexerInput = lexer.lookAhead;
                  // valueStack.push(obj);
              } else {
                  lexerInput = lexer.lookAhead;
              }
          } else {
              if (action == 0) {
                  ConsoleDebugColor.outlnPurple("The input can be accepted");
                  return;
              }

              int reduceProduction = -action;
              Production product = ProductionManager.getInstance().getProductionByIndex(reduceProduction);
              ConsoleDebugColor.outlnPurple("reduce by product: ");
              product.debugPrint();

              // takeActionForReduce(reduceProduction);

              int rightSize = product.getRight().size();
              while (rightSize > 0) {
                  parseStack.pop();
                  // valueStack.pop();
                  statusStack.pop();
                  rightSize--;
              }

              lexerInput = product.getLeft();
              parseStack.push(lexerInput);
              // valueStack.push(attributeForParentNode);
          }
      }
  }

  private Integer getAction(Integer currentState, Integer currentInput) {
      HashMap<Integer, Integer> jump = lrStateTable.get(currentState);
      return jump.get(currentInput);
  }

Ambiguity grammar

And now has completed all the content parsing, semantic analysis is followed, but before that there is a need to say is that our current construct finite state automaton belongs to the LALR (1) grammar, even LALR (1) the syntax is strong enough, but still there are LALR (1) grammar can not handle the syntax, if given derivation does not comply, then the finite state automaton still not resolved correctly, but the syntax given earlier are in line with LALR (1 )grammatical

summary

This is a major

  • Finite state automaton and reduce the information table is completed parsing
  • Table implementation using the syntax parsing table-driven parsing

Guess you like

Origin www.cnblogs.com/secoding/p/11371502.html