EC2介绍(二)

    整个AWS的基础是Dynamo分布式存储系统,因此很有必要了解一下这个分布式存储系统。
一、分布式存储系统的挑战
    设计一个分布式存储系统会碰到许多方面的挑战(参见备注1),从大的方面讲,由于可靠性、可用性和一致性这三者是不可兼得的,因此必然要在三者之间做出选择。Dynamo选择的是牺牲一定程度的一致性。其实很多其他设计为应付海量存储的分布式存储方案都做出了相同的选择。这是由于在海量访问的情况下,保证严格的一致性,将会导致性能急剧下降。
    由于Dynamo最初设计的目标是作为Amazon内部的存储架构,因此他根据自身业务特点做了一下几个预设前提:
•         数据以key-value的形式存储,只有两种操作:读取、写入。

•         数据条目(item)之间不发生联系,不支持Join等关系数据库支持的操作,没有Schema。

•         一个数据条目较小(小于1MB)。

•         考虑到提供完全的ACID保证导致性能低下,因此不提供完全的ACID支持。(参见备注2)

•         为了降低费用会使用普通电脑作为存储节点,该存储系统需要支持这点。

    其中第一第二点和目前逐渐升温的No-SQL风潮不谋而合,这说明在海量存储方面,No-SQL不仅仅是大有用武之地,而且是大势所趋。在这点上另一个网络巨头Google的做法也是大同小异,可谓是英雄所见略同。
二、Dynamo的设计思路
    出于以上考虑,Dynamo在设计上定了几个原则(参见备注3):
•         可持续扩展规模

•         Node之间完全对等

•         去中心化

•         支持异构系统

在这样的思路下,针对分布式存储系统中几个需要解决的大问题,Dynamo分别给出了解决方案:
•         数据均衡分布—一致性哈希算法

•         数据冲突处理—向量时钟

•         临时故障处理—Hinted handoff机制

•         永久故障恢复—Merkle哈希树

•         成员资格检测—gossip协议



(未完待续)


备注1:
一个分布式系统需要考虑的问题有以下这些:load balancing,membership and failure detection, failure recovery,replica synchronization,overload handling,state transfer,concurrency
job scheduling,request marshalling, request routing,system monitoring alarming,and configuration management
备注2:
ACID Properties: ACID (Atomicity, Consistency, Isolation,Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction.Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor availability. This has been widely acknowledged by both the industry and academia .

Dynamo targets applications that operate with weaker consistency (the “C” in ACID) if this results in high availability. Dynamo does not provide any isolation guarantees and permits only single key updates.(引自Amazon官方文档)

备注3:

Incremental scalability: Dynamo should be able to scale out one
storage host (henceforth, referred to as “node”) at a time, with
minimal impact on both operators of the system and the system
itself.
Symmetry: Every node in Dynamo should have the same set of
responsibilities as its peers; there should be no distinguished node
or nodes that take special roles or extra set of responsibilities. In
our experience, symmetry simplifies the process of system
provisioning and maintenance.
Decentralization: An extension of symmetry, the design should
favor decentralized peer-to-peer techniques over centralized
control. In the past, centralized control has resulted in outages and
the goal is to avoid it as much as possible. This leads to a simpler,
more scalable, and more available system.
Heterogeneity: The system needs to be able to exploit
heterogeneity in the infrastructure it runs on. e.g. the work
distribution must be proportional to the capabilities of the
individual servers. This is essential in adding new nodes with
higher capacity without having to upgrade all hosts at once.
(引自Amazon官方文档)

猜你喜欢

转载自deeravenger.iteye.com/blog/1109553
EC2