SQL parsing—how to easily implement new statements

KaiwuDB supports many different types of SQL statements, such as create, insert, etc. This article will introduce the process and implementation of adding new statements in KaiwuDB SQL Parser (hereinafter collectively referred to as the parser). We'll see how to use the goyacc tool to update the parser and how the executor and query planner work together to execute this statement.

1. Grammar and keywords

Adding a new SQL statement starts by adding the necessary syntax to the SQL parser. The parser is generated with goyacc, the Go version of the popular yacc compiler. The syntax definition is located in the pkg/sql/parser/sql.y file. The output of the parser is an abstract syntax tree (AST), in which the node types (node) are defined in various files in the pkg/sql/sem/tree directory.

Adding new statements to the SQL parser consists of three main components: adding new keywords, adding syntax to the statement parser, and adding new syntax node types.

二、FROBNICATE STATEMENT

This article takes adding a new statement in KaiwuDB as an example: FROBNICATE. This statement will randomly modify the database settings. It will have three options: FROBNICATE CLUSTER, for manipulating cluster settings; FROBNICATE SESSION, for manipulating session settings; and FROBNICATE ALL, for handling both at the same time.

Let's start by checking if all keywords are defined. Open the pkg/sql/parser/sql.y file and search for "Ordinary key words". You'll see an alphabetical list of tag definitions. Since other syntaxes already define the SESSION, CLUSTER and ALL keywords, we don't need to add them, but we do need to create a keyword for FROBNICATE. It should look like this:

%token <str> FROBNICATE

This tells the lexer to recognize the keyword, but we still need to add it to one of the category lists. If a keyword can appear in an identifier position, it must be reserved (reserved_keyword, which requires that other uses of it, such as as column names, must be quoted). Since our new keyword begins a SQL statement, it cannot be mistaken for an identifier, so we can safely add it to the list of unreserved keywords. Search for unreserved_keyword: in the pkg/sql/parser/sql.y file and add | FROBNICATE as follows:

unreserved_keyword:
...
| FROBNICATE
...

Now that the lexer knows all of our keywords, we need to teach the parser how to handle our new statements. There are three places where we need to add references: the statement type list , the statement case list , and the parsing clause .

Search for <tree.Statement> in the syntax file (pkg/sql/parser/sql.y) and you will find a list of types. Add a line about our new statement type, something like:

%type <tree.Statement> frobnicate_stmt

This way we add a type declaration for the new statement type "frobnicateStmt". Please note that "frobnicateStmt" is just an example name, you can customize it according to the actual situation.

Next, we need to add the new statement type to the list of statement cases. Continue searching the grammar file and find rules starting with "stmt" (such as stmt_select, stmt_insert, etc.). Add the following cases to these rules:

stmt:
...
| frobnicate_stmt // EXTEND WITH HELP: FROBNICATE
...

Finally, we need to add a production rule to our statement. Add the following rules in the pkg/sql/parser/sql.y file:

frobnicate_stmt:
  FROBNICATE CLUSTER { return unimplemented(sqllex, "frobnicate cluster") }
| FROBNICATE SESSION { return unimplemented(sqllex, "frobnicate session") }
| FROBNICATE ALL { return unimplemented(sqllex, "frobnicate all") }

The three expressions we allow are listed here, separated by pipe characters. Each production also has an implementation enclosed in curly braces (which temporarily throws an error with a "not implemented" error message).

Finally add help documentation for our statement. Above the production rule we just added, add the following comment:

// %Help: FROBNICATE - twiddle the various settings
// %Category: Misc
// %Text: FROBNICATE { CLUSTER | SESSION | ALL }

Now our parser will be able to recognize the new statement types and generate some new syntax-related comments to assist the user. After recompiling the code and trying to execute this statement, I get the following results:

$ kwbase sql --insecure -e "frobnicate cluster"
ERROR: at or near "cluster": syntax error: unimplemented: this syntax
SQLSTATE: 0A000
DETAIL: source SQL:
frobnicate cluster
          ^

HINT: You have attempted to use a feature that is not yet implemented.

Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.

If you would rather not post publicly, please contact us directly
using the support form.

We appreciate your feedback.
Failed running "sql"

This means that our newly added grammar was successfully parsed, but because it has not been implemented yet, no operations can be performed.

3. Add abstract syntax tree

With the syntax layer added, now we need to give the new statements appropriate semantics. We need an AST to pass the structure of the statement from the parser to the runtime. As mentioned above, our statement is %type <tree.Statement>, which means that it needs to implement the tree.Statement interface, which can be found in pkg/sql/sem/tree/stmt.go.

We need to write four functions: three for the Statement interface itself (StatementReturnType, StatementType and StatementTag), one for NodeFormatter (Format), and the standard fmt.Stringer.

Please create a new file for our statement type: pkg/sql/sem/tree/frobnicate.go. In it, put the format and definition of our AST node.

package tree

type Frobnicate struct {
  Mode FrobnicateMode
}

var _ Statement = &Frobnicate{}

type FrobnicateMode int

const (
  FrobnicateModeAll FrobnicateMode = iota
  FrobnicateModeCluster
  FrobnicateModeSession
)

func (node *Frobnicate) Format(ctx *FmtCtx) {
  ctx.WriteString("FROBNICATE ")
  switch node.Mode {
  case FrobnicateModeAll:
    ctx.WriteString("ALL")
  case FrobnicateModeCluster:
    ctx.WriteString("CLUSTER")
  case FrobnicateModeSession:
    ctx.WriteString("SESSION")
  }
} 

To add statement and string representations of our AST tree, open the pkg/sql/sem/tree/stmt.go file and search for // StatementReturnType implements the Statement interface. Now you can see the list of implementations of different types of AST. Insert the following into it in alphabetical order:

func (node *Frobnicate) StatementReturnType() StatementReturnType { return Ack }
 
// StatementType implements the Statement interface.
func (node *Frobnicate) StatementType() StatementType { return TypeDCL }

// StatementTag returns a short string identifying the type of statement.
func (node *Frobnicate) StatementTag() string               { return "FROBNICATE" }

Next, add the following in alphabetical order:

func (n *Frobnicate) String() string            { return AsString(n) }

Now we need to update the parser to return a FROBNICATE node (AST) with the appropriate schema type when it encounters our grammar. Return to the pkg/sql/parser/sql.y file, search for // %Help: FROBNICATE, and replace the statement with the following:

frobnicate_stmt:
  FROBNICATE CLUSTER { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeCluster} }
| FROBNICATE SESSION { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeSession} }
| FROBNICATE ALL { $$.val = &tree.Frobnicate{Mode: tree.FrobnicateModeAll} }

The special symbol $$.val represents the node value generated by this rule. There are some other $ symbols that can be used in yacc. One of the more useful forms is to refer to the node value of the subproduction (for example, in these three statements, $1 would be the token FROBNICATE).

Next, recompile KaiwuDB and re-enter the new syntax to get the following results:

$ kwbase sql --insecure -e "frobnicate cluster"
Error: pq: unknown statement type: *tree.Frobnicate
Failed running "sql"

Now we see a different error than before. This error comes from the SQL planner (planner), which doesn't know what to do when it encounters a new statement type. We need to teach it the meaning of new sentences. Although our statement will not play a role in any query plan, we will achieve it by adding a method to the planner. This is where centralized statement dispatch occurs, so semantics are added there.

Locating the source code for the error we're currently seeing is at the end of a long list of type selection statements in the /pkg/sql/opaque.go file. Let's add a case to it:

case *tree.Frobnicate:
    return p.Frobnicate(ctx, n)

Similarly, add the following content below the init() function in the same file /pkg/sql/opaque.go:

&tree.Frobnicate{},

This will call a method on the planner itself (not yet implemented). Let's implement this method in pkg/sql/frobnicate.go file.

package sql
import (
    "context"

    "github.com/kwbasedb/kwbase/pkg/sql/sem/tree"
    "github.com/kwbasedb/errors"
)

func (p *planner) Frobnicate(ctx context.Context, stmt *tree.Frobnicate) (planNode, error) {
    return nil, errors.AssertionFailedf("We're not quite frobnicating yet...")
}

At this time, recompile KaiwuDB and execute the statement again:

$ kwbase sql --insecure -e "frobnicate cluster"
Error: pq: We're not quite frobnicating yet...
Failed running "sql"

So far, we have been able to pass errors to the SQL client. We only need to add functional code to the above interface to make the statement effective.

Broadcom announced the termination of the existing VMware partner program deepin-IDE version update, a new look. WAVE SUMMIT is celebrating its 10th edition. Wen Xinyiyan will have the latest disclosure! Zhou Hongyi: Hongmeng native will definitely succeed. The complete source code of GTA 5 has been publicly leaked. Linus: I won’t read the code on Christmas Eve. I will release a new version of the Java tool set Hutool-5.8.24 next year. Let’s complain about Furion together. Commercial exploration: the boat has passed. Wan Zhongshan, v4.9.1.15 Apple releases open source multi-modal large language model Ferret Yakult Company confirms that 95 G data was leaked
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5148943/blog/10452131