The rise NoSQL database

Foreword

In recent years, the rise of NoSQL databases, a variety of new product after another, in this study the basic theory of NoSQL, and under the common understanding NoSQL database.

Reasons for the rise of a NoSQL database

With the advent of technology and the rise of Web2.0 era of big data. Traditional relational database has been unable to meet the current needs of the database.

Insatiable demand mainly in three points:

  • Storage and management of massive data ( traditional relational database has been unable to support )
  • Concurrency in a large amount of data ( the traditional relational database strict transaction mechanisms leading to massive data operation will lead to a wide range of data locking, reduce concurrency )
  • High availability, high scalability ( users are more concerned about whether the function is available. Huge amounts of data need to scale to meet the needs of database, has been unable to meet the portrait )

With the advantages of the original relational database has not been required compared to many companies, there are mainly three points:

  • Strict database transaction ( such as micro-channel, Sina microblogging and other Internet companies, such as the loss of a message, ACID realized or not is not very important )
  • Strict read and write real-time ( Similarly, the server writes a message to others and immediately see whether less important )
  • Complex query conditions ( in order to save memory space reducing redundant hardware, will traditional relational database table stores a variety of information points, but now has sufficient hardware performance all stored information. And not require complex multiple query table )

In order to meet business needs in a large amount of data, traditional relational database also developed a variety of techniques, but in the end it turns out NoSQL database is the most suitable choice. Solution traditional relational database experience the following phases:

  1. From the master copy, separate read and write. A master set, from the plurality of servers. The primary server is responsible for write operations, and real-time replication to modify the content from the server. From the server is responsible for read operations. ( But still can not afford to solve the write request )
  2. Sub-libraries, split part of the request. Sub-library is divided into longitudinal and lateral sub-library sub-libraries, sub-libraries lateral resolution is about different services database open dimensions, according to the service server scenario, different database queries. Sub-library is about longitudinal rows according to certain rules are stored in different databases. Such as: The hash, according to the production time. ( But not directly lead to a different database queries, and still unable to meet greater data requirements )
  3. Points table, similar to the sub-libraries. By a horizontal or vertical slicing table.

Two of the four types of NoSQL databases

They are:

  • Key database
  • Column family databases
  • Document database
  • Map Database

2.1 general characteristics (advantages)

  • Flexible data structure. ( Traditional relational database fields have strict requirements, complexity and subsequent modifications )
  • Scalability ( easy to scale, distributed support and extended low complexity, compared to a conventional extended relational database very complex )
  • Support high concurrent operations.

2.2 characteristics of each

  1. Key-value database. Key-value pairs stored in the database.
  • Advantages: For a large number of write operations.
  • Disadvantages: but the data is not stored in a structured, low efficiency of complex queries.
  • Application: Commonly used to make the contents of the cache.
  • Representative products: Redis, Memcached
  1. Column group database, storing the database based on the underlying column family. (When the search, based on the row key lookup column family, can be seen as a variant key database)
  • Advantages: speed query, lateral extensibility are particularly good for distributed systems, the complexity of distributed shield.
  • Disadvantages: features a simple, large do not support transactional consistency. ( Hadoop's HBase is supported )
  • Application: Distributed data storage.
  • Representative products: Cassandra, HBase
  1. Document databases, key-based store documents. (Variant can also be seen as the key database)
  • Advantages: semi-structured data format can be self-explanatory, such as: JSON, XML. Thus the data structure is flexible and high concurrency.
  • Disadvantages: lack of a unified query syntax
  • Applications: store the document type data, semi-structured data.
  • Representative products: MongoDB, CouchDB
  1. FIG database, the database data structure based on FIG.
  • Pros: support complex relationship map and graph algorithms
  • Disadvantages: only for the relationship between applications and FIG poor performance in other areas.
  • Application: FIG Complex structures, such as social networks, the relationship map.
  • Representative products: Neo4J, InfoGrid

The three cornerstones of three NoSQL database

3.1 CAP theory of the three properties

  • C: consistency (read any of always reading the write operation to complete before the results)
  • A: availability (each operation can be always returned within a determined time, i.e. the system always are available)
  • P: partition tolerance (network partition occurs, the entire system is still available)

Proven, a distributed system can not meet the three properties at the same time, up to two meet.

Traditional relational database to meet the CA, abandoned P. Therefore, expansion difficult. And most Internet systems are now distributed systems, it is impossible to give up P characteristics.

Under the popular explanation why only meet two characteristics:

A and C satisfy assumptions and P. P ensure the existence of different network nodes in the system, then in order to ensure C, the system will try to synchronize data with other nodes of information, but a network problem causes the system partition (ie nodes can not communicate with each other), cause the synchronization can not be completed immediately, so a will not be able to meet up.

At this point you can only remove one characteristic:

  1. Removing P, reserved CA. There is no network communication problems when implementing data consistency C, can be quickly completed, but also to ensure A.
  2. Removal of A, reserved CP. Is not required to complete immediately upon achieving data consistency C, even if there is a network partition P, may be slowly waiting.
  3. Removing the C, retention AP. It is not necessary to ensure data consistency, even if the network is partitioned, each node can run separately, ensure that the user is available (do not care anyway, the system has been consistent each node of the data).

3.2 BASE theory

BASE model anti-ACID model, completely different ACID model, the expense of high consistency, access availability.

  • BA: Basic available. Part of zoning problems, the system is still available, mainly to ensure that core functions are available. ( When large electricity providers to promote, in response to the surge in traffic, some users may be directed to downgrade the page, the service layer may also be provided only to downgrade services. This is reflected in the loss of part availability )
  • S: soft state. The consistency of the data required to reduce possible period of time does not meet the consistency. Corresponding to the hard state. ( Distributed Storage generally have at least three copies of a piece of data, allowing synchronization between different nodes replica delay is a manifestation of the soft state of the asynchronous copy .mysql replication is a reflection. )
  • E: eventual consistency. Weak consistency, follow-up action can not obtain updated information immediately. Correspondingly strong consistency. The final consistency is a special weak consistency, the only guarantee.

3.3 eventual consistency

(Obviously BASE contains the final consistency, I do not know why the book turn it alone as one of the three theoretical basis)

Guess you like

Origin www.cnblogs.com/taojinxuan/p/11130328.html