【Vitess介绍】

Vitess is a database clustering system for horizontal scaling of MySQL through generalized sharding.

 

By encapsulating shard-routing logic, Vitess allows application code and database queries to remain agnostic to the distribution of data onto multiple shards. With Vitess, you can even split and merge shards as your needs grow, with an atomic cutover step that takes only a few seconds.

 

 

Scalability

Vitess combines many important MySQL features with the scalability of a NoSQL database. Its built-in sharding features let you grow your database without adding sharding logic to your application.

 

Performance

Vitess automatically rewrites queries that hurt database performance. It also uses caching mechanisms to mediate queries and prevent duplicate queries from simultaneously reaching your database.

 

Manageability

Vitess automatically handles functions like master failovers and backups. It uses a lock server to track and administer servers, letting your application be blissfully ignorant of database topology.

 

Connection pooling

Vitess eliminates the high-memory overhead of MySQL connections. Its gRPC-based protocol lets Vitess servers easily handle thousands of connections at once.

 

Shard Management

MySQL doesn't natively support sharding, but you will likely need it as your database grows. Vitess saves you from having to add sharding logic to your app and also enables live resharding with minimal read-only downtime.

 

 

Workflow

Vitess keeps track of all of the metadata about your cluster configuration so that the cluster view is always up-to-date and consistent for different clients.



 

Features

  • Performance

    • Connection pooling - Scale front-end connections while optimizing MySQL performance.
    • Query de-duping – Reuse results of an in-flight query for any identical requests received while the in-flight query was still executing.
    • Transaction manager – Limit number of concurrent transactions and manage deadlines to optimize overall throughput.
  • Protection

    • Query rewriting and sanitation – Add limits and avoid non-deterministic updates.
    • Query blacklisting – Customize rules to prevent potentially problematic queries from hitting your database.
    • Query killer – Terminate queries that take too long to return data.
    • Table ACLs – Specify access control lists (ACLs) for tables based on the connected user.
  • Monitoring

    • Performance analysis: Tools let you monitor, diagnose, and analyze your database performance.
    • Query streaming – Use a list of incoming queries to serve OLAP workloads.
    • Update stream – A server streams the list of rows changing in the database, which can be used as a mechanism to propagate changes to other data stores.
  • Topology Management Tools

    • Master management tools (handles reparenting)
    • Web-based management GUI
    • Designed to work in multiple data centers / regions
  • Sharding

    • Virtually seamless dynamic re-sharding
    • Vertical and Horizontal sharding support
    • Built-in range-based, or application-defined sharding support

Architecture

The Vitess platform consists of a number of server processes, command-line utilities, and web-based utilities, backed by a consistent metadata store.

Depending on the current state of your application, you could arrive at a full Vitess implementation through a number of different process flows. For example, if you're building a service from scratch, your first step with Vitess would be to define your database topology. However, if you need to scale your existing database, you'd likely start by deploying a connection proxy.

Vitess tools and servers are designed to help you whether you start with a complete fleet of databases or start small and scale over time. For smaller implementations, vttablet features like connection pooling and query rewriting help you get more from your existing hardware. Vitess' automation tools then provide additional benefits for larger implementations.

The diagram below illustrates Vitess' components:

Diagram showing Vitess implementation

Topology

The Topology Service is a metadata store that contains information about running servers, the sharding scheme, and the replication graph. The topology is backed by a consistent data store. You can explore the topology using vtctl (command-line) and vtctld (web).

In Kubernetes, the data store is etcd. Vitess source code also ships with Apache ZooKeeper support.

vtgate

vtgate is a light proxy server that routes traffic to the correct vttablet(s) and returns consolidated results back to the client. It is the server to which applications send queries. Thus, the client can be very simple since it only needs to be able to find a vtgate instance.

To route queries, vtgate considers the sharding scheme, required latency, and the availability of the tablets and their underlying MySQL instances.

vttablet

vttablet is a proxy server that sits in front of a MySQL database. A Vitess implementation has one vttablet for each MySQL instance.

vttablet performs tasks that attempt to maximize throughput as well as protect MySQL from harmful queries. Its features include connection pooling, query rewriting, and query de-duping. In addition, vttablet executes management tasks that vtctl initiates, and it provides streaming services that are used forfiltered replication and data exports.

A lightweight Vitess implementation uses vttablet as a smart connection proxy that serves queries for a single MySQL database. By running vttablet in front of your MySQL database and changing your app to use the Vitess client instead of your MySQL driver, your app benefits from vttablet's connection pooling, query rewriting, and query de-duping features.

vtctl

vtctl is a command-line tool used to administer a Vitess cluster. It allows a human or application to easily interact with a Vitess implementation. Using vtctl, you can identify master and replica databases, create tables, initiate failovers, perform sharding (and resharding) operations, and so forth.

As vtctl performs operations, it updates the lockserver as needed. Other Vitess servers observe those changes and react accordingly. For example, if you use vtctl to fail over to a new master database, vtgate sees the change and directs future write operations to the new master.

vtctld

vtctld is an HTTP server that lets you browse the information stored in the lockserver. It is useful for troubleshooting or for getting a high-level overview of the servers and their current states.

vtworker

vtworker hosts long-running processes. It supports a plugin architecture and offers libraries so that you can easily choose tablets to use. Plugins are available for the following types of jobs:

  • resharding differ jobs check data integrity during shard splits and joins
  • vertical split differ jobs check data integrity during vertical splits and joins

vtworker also lets you easily add other validation procedures. You could do in-tablet integrity checks to verify foreign-key-like relationships or cross-shard integrity checks if, for example, an index table in one keyspace references data in another keyspace.

Other support tools

Vitess also includes the following tools:

  • mysqlctl: Manage MySQL instances
  • zk: Command-line ZooKeeper client and explorer
  • zkctl: Manage ZooKeeper instances

架构

Vitess 平台由若干服务器进程、命令行工具和基于 web 的工具组成,具备一致性元数据存储支持。
根据你自己应用程序的现状,你可以通过许多个不同的程序流程最终实现一个完整的 Vitess。比如,如果你正在从头构建一个服务,你的 Vitess 之旅应该起始于定义你自己的数据库拓扑。然而,如果你是需要对自己现有数据库进行扩展,那你首先需要部署一个连接代理。
无论你的数据库集群规模大小,Vitess 工具和服务器都旨在对你提供帮助。对于较小的实现,vttablet 的一些特性诸如连接池和行缓存可以帮助你对你现有的硬件获得更充分的利用。Vitess 的自动化工具则为大型实现提供额外的好处。
下图对 Vitess 的相关组件进行说明:
VitessOverview

Topology

Topology 服务是一个包含有关于运行中的服务器、分片方案以及主从库结构等信息的元数据存储。该拓扑具备一个一致性数据存储支持。你可以通过使用 vtctl (命令行)以及 vtctld (web 界面)浏览你的拓扑。
Kubernetes 中的数据存储是 etcd 的。Vitess 的源代码中还包含有 Apache ZooKeeper 支持。

vtgate

vtgate 是一个将流量路由到正确的 vttablet 并将合并后的结果返回给客户端的轻量级的代理服务器。应用程序发送查询给 vtgate 服务器。因此客户端逻辑很简单因为它只需要能够找到一个 vtgate 实例。
要对查询进行路由,vtgate 需要考虑分片方案、需要的延迟、tablet 以及它们背后 MySql 实例的可用性。

vttablet

vttablet 是位于某台 MySql 数据库前边的一台代理服务器。一个 Vitess 实现中,每台 MySql 实例都有一个 vttablet。
vttablet 扮演着一个试图将 MySql 的吞吐量最大化以及保护 MySql 规避有害查询的角色。它的特性包括连接池、查询重写以及查询复制。此外,vttablet 还会执行 vtctl 启动的任务,并且提供用于主从库请求过滤和数据输出的流量服务。
一个轻量级的 Vitess 实现使用 vttablet 作为单台 MySql 数据库查询服务的智能连接代理。通过在你的 MySql 数据库前面运行 vttablet 并且改变你的应用程序使用 Vitess 的客户端替代 MySql 驱动,你的应用程序就可以得到 vttablet 的连接池、查询重写以及查询复制等一系列优秀特性。

vtctl

vtctl 是一个用于管理 Vitess 集群命令行工具。它允许一个人或应用程序轻松地与一个 Vitess 实现进行交互。使用 vtctl,你可以识别主从库、建表、启动故障转移、进行分片(以及重新分片)操作,等等。
vtctl 操作的时候,它会根据需要更新锁定服务器。其 Vitess 服务器观察到这些变化并相应地做出反应。例如,如果你使用 vtctl 将某台主库故障转移到一台新的主库,vtgate 看到这一变化之后就会把后续的写操作发给这台新主库。

vtctld

vtctld 是一台允许你浏览存储在锁定服务器的信息的 HTTP 服务器。它可以用于故障诊断,也可用于获取所有服务器及其当前状态的一个高级概述。

vtworker

vtworker 执行一些需要长时间运行的进程。它支持一个插件式的架构并提供了第三方库,这样你可以轻易选择要使用的 tablet。该插件可以用于以下类型的工作:

  • resharding differ:在水平分片的分割以及合并时核查数据完整性的工作
  • vertical split differ:在垂直分割以及合并时核查数据完整性的工作

vtworker 还允许您轻松地添加其他验证程序。你可以进行 in-tablet 完整性检查以验证外键之类的关联关系或者跨片完整性检查,例如,一个密钥空间里的索引表所指向的数据在另一个密钥空间里。

其他支持工具

Vitess 还包含有以下工具:

  • mysqlctl:管理 MySql 实例
  • zk:ZooKeeper 命令行式的客户端和浏览器
  • zkctl:管理 ZooKeeper 实例

猜你喜欢

转载自gaojingsong.iteye.com/blog/2384552