【Apache Accumulo 介绍】

Apache Accumulo 是一个可靠的、可伸缩的、高性能的排序分布式的 Key-Value 存储解决方案,基于单元访问控制以及可定制的服务器端处理。使用 Google BigTable 设计思路,基于 Apache Hadoop、Zookeeper 和 Thrift 构建.



 

Leveldb是Google开发的一个非常高效的kv数据库,支持billion级别的数据量,在这个数量级别下还有着非常高的性能,主要归功于它的良好的设计,特别是LSM算法。Leveldb已经作为存储引擎被Riak和Kyoto Tycoon所支持,在国内淘宝的Tair开源key-value存储也已经将LevelDB作为其持久化存储引擎,并部署在线上使用。

Apache Accumulo is based on the design of Google's BigTable and is powered by Apache Hadoop, Apache Zookeeper, and Apache Thrift.

Accumulo has several novel features such as cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.

Accumulo is a distributed data storage and retrieval system and as such consists of several architectural

components, some of which run on many individual servers. Much of the work Accumulo

does involves maintaining certain properties of the data, such as organization, availability, and

integrity, across many commodity-class machines.



 

二、组成部分介绍---Accumulo Components

An instance of Accumulo includes many TabletServers, one Garbage Collector process, one

Master server and many Clients.

2.3.1 Tablet Server

The TabletServer manages some subset of all the tablets (partitions of tables). This includes

receiving writes from clients, persisting writes to a write-ahead log, sorting new key-value pairs

in memory, periodically flushing sorted key-value pairs to new files in HDFS, and responding to

reads from clients, forming a merge-sorted view of all keys and values from all the files it has

created and the sorted in-memory store.

TabletServers also perform recovery of a tablet that was previously on a server that failed,

reapplying any writes found in the write-ahead log to the tablet.

2.3.2 Garbage Collector

Accumulo processes will share files stored in HDFS. Periodically, the Garbage Collector will

identify files that are no longer needed by any process, and delete them. Multiple garbage

collectors can be run to provide hot-standby support. They will perform leader election among

themselves to choose a single active instance.

2.3.3 Master

The Accumulo Master is responsible for detecting and responding to TabletServer failure. It tries

to balance the load across TabletServer by assigning tablets carefully and instructing TabletServers

to unload tablets when necessary. The Master ensures all tablets are assigned to one

TabletServer each, and handles table creation, alteration, and deletion requests from clients.

The Master also coordinates startup, graceful shutdown and recovery of changes in write-ahead

logs when Tablet servers fail.

Multiple masters may be run. The masters will choose among themselves a single master, and

the others will become backups if the master should fail.

2.3.4 Tracer

The Accumulo Tracer process supports the distributed timing API provided by Accumulo. One

to many of these processes can be run on a cluster which will write the timing information to a

given Accumulo table for future reference. Seeing the section on Tracing for more information

on this support.

2.3.5 Monitor

The Accumulo Monitor is a web application that provides a wealth of information about the

state of an instance. The Monitor shows graphs and tables which contain information about

read/write rates, cache hit/miss rates, and Accumulo table information such as scan rate and

active/queued compactions. Additionally, the Monitor should always be the first point of entry

when attempting to debug an Accumulo problem as it will show high-level problems in addition

to aggregated errors from all nodes in the cluster. See the section on Monitoring for more

information.

Multiple Monitors can be run to provide hot-standby support in the face of failure. Due to the

forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active

at one time. Leader election will be performed internally to choose the active Monitor.

2.3.6 Client

Accumulo includes a client library that is linked to every application. The client library contains

logic for finding servers managing a particular tablet, and communicating with TabletServers to

write and retrieve key-value pairs.

猜你喜欢

转载自gaojingsong.iteye.com/blog/2344067