Apache Accumulo 是一个可靠的、可伸缩的、高性能的排序分布式的 Key-Value 存储解决方案,基于单元访问控制以及可定制的服务器端处理。使用 Google BigTable 设计思路,基于 Apache Hadoop、Zookeeper 和 Thrift 构建.
Leveldb是Google开发的一个非常高效的kv数据库,支持billion级别的数据量,在这个数量级别下还有着非常高的性能,主要归功于它的良好的设计,特别是LSM算法。Leveldb已经作为存储引擎被Riak和Kyoto Tycoon所支持,在国内淘宝的Tair开源key-value存储也已经将LevelDB作为其持久化存储引擎,并部署在线上使用。
Apache Accumulo is based on the design of Google's BigTable and is powered by Apache Hadoop, Apache Zookeeper, and Apache Thrift.
Accumulo has several novel features such as cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.
Accumulo is a distributed data storage and retrieval system and as such consists of several architectural
components, some of which run on many individual servers. Much of the work Accumulo
does involves maintaining certain properties of the data, such as organization, availability, and
integrity, across many commodity-class machines.
二、组成部分介绍---Accumulo Components
An instance of Accumulo includes many TabletServers, one Garbage Collector process, one
Master server and many Clients.
2.3.1 Tablet Server
The TabletServer manages some subset of all the tablets (partitions of tables). This includes
receiving writes from clients, persisting writes to a write-ahead log, sorting new key-value pairs
in memory, periodically flushing sorted key-value pairs to new files in HDFS, and responding to
reads from clients, forming a merge-sorted view of all keys and values from all the files it has
created and the sorted in-memory store.
TabletServers also perform recovery of a tablet that was previously on a server that failed,
reapplying any writes found in the write-ahead log to the tablet.
2.3.2 Garbage Collector
Accumulo processes will share files stored in HDFS. Periodically, the Garbage Collector will
identify files that are no longer needed by any process, and delete them. Multiple garbage
collectors can be run to provide hot-standby support. They will perform leader election among
themselves to choose a single active instance.
2.3.3 Master
The Accumulo Master is responsible for detecting and responding to TabletServer failure. It tries
to balance the load across TabletServer by assigning tablets carefully and instructing TabletServers
to unload tablets when necessary. The Master ensures all tablets are assigned to one
TabletServer each, and handles table creation, alteration, and deletion requests from clients.
The Master also coordinates startup, graceful shutdown and recovery of changes in write-ahead
logs when Tablet servers fail.
Multiple masters may be run. The masters will choose among themselves a single master, and
the others will become backups if the master should fail.
2.3.4 Tracer
The Accumulo Tracer process supports the distributed timing API provided by Accumulo. One
to many of these processes can be run on a cluster which will write the timing information to a
given Accumulo table for future reference. Seeing the section on Tracing for more information
on this support.
2.3.5 Monitor
The Accumulo Monitor is a web application that provides a wealth of information about the
state of an instance. The Monitor shows graphs and tables which contain information about
read/write rates, cache hit/miss rates, and Accumulo table information such as scan rate and
active/queued compactions. Additionally, the Monitor should always be the first point of entry
when attempting to debug an Accumulo problem as it will show high-level problems in addition
to aggregated errors from all nodes in the cluster. See the section on Monitoring for more
information.
Multiple Monitors can be run to provide hot-standby support in the face of failure. Due to the
forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active
at one time. Leader election will be performed internally to choose the active Monitor.
2.3.6 Client
Accumulo includes a client library that is linked to every application. The client library contains
logic for finding servers managing a particular tablet, and communicating with TabletServers to
write and retrieve key-value pairs.