In the preface of the previous article, we described the origin of the source code interpretation series. In the Nebula Graph Overview chapter, we will take you to understand the architecture of Nebula Graph, code warehouse distribution, code structure and module planning.
1. Architecture
Nebula Graph is an open source distributed graph database. Nebula adopts a design that separates storage and computing to decouple storage and computing. At the same time, in addition to the database core, we also provide many peripheral tools, such as data import, monitoring, deployment, visualization, graph computing and so on.
For Nebula design, please refer to "A Survey of Graph Databases and the Practice of Nebula in Graph Database Design" .
The overall architecture design is shown in the following figure:
The query engine adopts a stateless design and can easily achieve horizontal expansion. It is divided into several main parts, such as syntax analysis, semantic analysis, optimizer, and execution engine.
For detailed design, please refer to "Query Engine Design of Graph Database" and "Introduction to Nebula Graph 2.0 Query Engine" .
The query engine architecture design is shown in the following figure:
Storage consists of two parts, one is meta-related storage, which we call Meta Service, and the other is data-related storage, which we call Storage Service.
Storage Service has three layers: the bottom layer is the Store Engine; the top layer is our Consensus layer, which implements Multi Group Raft; the top layer is our Storage interface, which defines a series of graph-related APIs.
For detailed design, please refer to "Storage Design of Graph Database" .
The storage engine architecture design is shown in the following figure:
2. Code repository overview
Welcome to the vesoft code repository (vesoft is the developer of the graph database Nebula Graph).
The current Nebula product architecture includes graph database kernel, client, tools, testing framework, compilation, visualization, monitoring, etc.
The main purpose of this article is to briefly introduce the code structure of Nebula Graph's main Repo and explain the basic functions of each module. More detailed design instructions will follow. I hope to help community readers better understand Nebula Graph and make their own contributions to the Nebula community, such as submitting features, fixing bugs, submitting documents, etc.
The following lists most of the code repositories in the vesoft-inc repository:
- nebula : Kernel code for Nebula 1.0
- nebula graph : Nebula 2.0 query calculation engine
- nebula storage : Nebula 2.0 storage engine
- nebula common : Nebula 2.0 kernel toolkit
- Nebula Clients
- nebula-java : Java client
- nebula-cpp : CPP client
- nebula-go : Go client
- nebula-python : Python client
- Nebula Tools
- nebula-importer : A high-performance data import tool based on Go client implementation
- nebula-spark-utils : include tools Spark Connector, Exchange, Algorithm
- nebula-br : Backup recovery tool
- nebula-ansible , nebula-operator : deployment tools
- Nebula Test
- nebula-bench : stress and performance testing engineering
- nebula-chaos : Chaos Testing Engineering
- Compiling
- nebula-third-party : The third-party package that the Nebula Graph graph database kernel depends on
- nebula-gears : Nebula Graph graph database kernel toolchain
- nebula-graph-studio : Nebula Graph visualization tool
3. Code structure and module description
3.1 Nebula Graph
├── cmake
├── conf
├── LICENSES
├── package
├── resources
├── scripts
├── src
│ ├── context
│ ├── daemons
│ ├── executor
│ ├── optimizer
│ ├── parser
│ ├── planner
│ ├── scheduler
│ ├── service
│ ├── session
│ ├── stats
│ ├── util
│ ├── validator
│ └── visitor
└── tests
├── admin
├── bench
├── common
├── data
├── job
├── maintain
├── mutate
├── query
└── tck
- conf/: query engine configuration file directory
- package/: graph packaging script
- resources/: resource file
- scripts/: startup scripts
- src/: query engine source code directory
- src/context/: Query context information, including AST (Abstract Syntax Tree), Execution Plan (Execution Plan), execution results, and other computing-related resources.
- src/daemons/: query engine main process
- src/executor/: executor, implementation of each operator
- src/optimizer/: RBO (rule-based optimization) implementation, and optimization rules
- src/parser/: lexical parsing, syntax parsing, : AST structure definition
- src/planner/: operator, and execution plan generation
- src/scheduler/: the scheduler that executes the plan
- src/service/: Query engine service layer, providing authentication and executing Query interface
- src/session/: Session management
- src/stats/: Execution statistics, such as P99, slow query statistics, etc.
- src/util/: utility functions
- src/validator/: Semantic analysis implementation, used to check for semantic errors and make some simple rewrite optimizations
- src/visitor/: expression accessor, used to extract expression information, or optimize
- tests/: BDD-based integration testing framework to test all functions provided by Nebula Graph
3.2 Nebula Storage
├── cmake
├── conf
├── docker
├── docs
├── LICENSES
├── package
├── scripts
└── src
├── codec
├── daemons
├── kvstore
├── meta
├── mock
├── storage
├── tools
├── utils
└── version
- conf/: Storage engine configuration file directory
- package/: storage packaging script
- scripts/: startup scripts
- src/: Storage engine source code directory
- src/codec/: serialization deserialization tool
- src/daemons/: storage engine and metadata engine main process
- src/kvstore/: raft-based distributed KV storage implementation
- src/meta/: KVStore-based metadata management service implementation, used to manage metadata information, cluster management, long-term task management, etc.
- src/storage/: KVStore-based graph data storage engine implementation
- src/tools/: some gadget implementations
- src/utils/: code utility functions
3.3 Nebula Common
├── cmake
│ └── nebula
├── LICENSES
├── src
│ └── common
│ ├── algorithm
│ ├── base
│ ├── charset
│ ├── clients
│ ├── concurrent
│ ├── conf
│ ├── context
│ ├── cpp
│ ├── datatypes
│ ├── encryption
│ ├── expression
│ ├── fs
│ ├── function
│ ├── graph
│ ├── hdfs
│ ├── http
│ ├── interface
│ ├── meta
│ ├── network
│ ├── plugin
│ ├── process
│ ├── session
│ ├── stats
│ ├── test
│ ├── thread
│ ├── thrift
│ ├── time
│ ├── version
│ └── webservice
└── third-party
The Nebula Common repository code is a toolkit of Nebula kernel code, providing efficient implementations of some common tools. I believe that some common toolkits must be familiar to all engineers. Only the directories closely related to the graph database are described here.
- src/common/clients/: meta, the CPP implementation of the storage client
- src/common/datatypes/: Definition of data types and calculations in Nebula Graph, such as string, int, bool, float, Vertex, Edge, etc.
- rc/common/expression/: Definition of expressions in nGQL
- src/common/function/: Definition of functions in nGQL
- src/common/interface/: interface definitions for graph, meta, storage services
The above is the introduction of this article.
Exchange graph database technology? To join the Nebula exchange group, please fill in your Nebula business card first , and the Nebula assistant will pull you into the group~~