AWS cable a little summary of the architectural design after being dug (a)

No public: large sub-trotters programmer

Yesterday science and technology circle hottest news should be "AWS cable was dug in China, leading to Samsung, millet and other business services are not available."
Fiber optic cable is being dug, Hey! ? Why is another, let's go back together:

  • 2019.6.02: Amazon Waduan cable, the network is abnormal parts of the country
  • 2019.3.23: fiber optic construction team Waduan Tencent, Tencent's 100 caused by a variety of games affected, lost big
  • 2015.5.27: Xiaoshan Area due to fiber Waduan, resulting in a small number of users currently can not use Alipay

I am here just to list a few large companies involved in fiber optic cable was dug accident, the rest also include radio and television what cable was dug, dug IESS cable is not listed, and interested themselves to Baidu.

Well, we found that "the company is bigger, but also afraid of the construction team," then this accident could blame the construction team? Personally feel can not put all the responsibility for the construction team, of course, we do not discuss them here, we as a large company, we later how to prevent this phenomenon?
We can look at this Alipay solution, after all, it is the elderly in 2015, he experienced such a miserable situation.

September 20, 2018, Hangzhou Yunqi Assembly ATEC main forum site staged a special technology show. Ants gold dress deputy CTO Hu hi-site analog cable Waduan pay treasure for nearly half of the server. The results of just over 26 seconds, Alipay simulated environment is completely back to normal.

This solution is the "three five centers", which is a room architecture, namely the deployment of five rooms in three cities, once one or two rooms failure, you can rely on technology to switch all traffic to the city's fault normal operation of the engine room.
So before the "three five centers" there are many other architectures us one look at their characteristics.

Evolution disaster

Initially, we apply (a very simple read-only applications, such as a display Hello World web page, regardless of data storage) only on one machine, then the machine when the server is down, our application will not be used.
Therefore, we consider our application on multiple machines, the company opened a separate room to place these machines, so that one machine alone down the machine does not affect our application.
However, if your company one day power cut it? This time we consider in another place in the city to place a room, this is the application was deployed in the two rooms the same city (this is called the city-active )
But if you go through the city one day a tsunami, typhoon , earthquakes and other natural disasters, the two rooms can not be used, and this time we will consider re-build a room in another city to deploy our applications, so our application availability even higher (this is called off-site live ) .
Well, so far no matter what kind of situation occurs, our application is basically available (unless destroy the earth ...)

So we consider the application of the above is a very simple read-only application, so all parts of the application that can simultaneously provide services, then our application if it comes to data storage, this time of the application can not be everywhere at the same time provide external write in data services, because the data conflict is likely to occur, then we for the time being the company's internal regulations only room where the server (later we called computer room) can write data to provide services, while the city another room and another off-site a room can only synchronize the data from the main computer room, so that the function of these two parts of the room called disaster recovery because the data is synchronized, so even if the main computer room power outage, the other two rooms can still be temporary external service provider. So now the architecture as follows:

image.png

When the main computer room power outage, the user will request Beijing to back up room, when Beijing backed room also power outages, user requests will go to Shanghai to back up the room.
Well, for this architecture, we just say that only the host can provide services outside the room, the other two rooms are just as backup disaster recovery, then that is the backup engine room utilization rate is not high, because after all, under normal request from the host room never old power outage, so the backup engine room can not improve its utilization of it? Sure, we can make a backup room Beijing also request to receive part of the business, but these requests may be less important, such as some read requests, while Shanghai's backup engine room do not receive the request, or simply as a disaster recovery machine, because if No one can guarantee a service request is received when the backup engine room there will be any other unpredictable problems, it is now the role of three rooms actually have somewhat different:

image.png

This is called three centers in two.
So three centers in two of this architecture is an architecture that many banks or large enterprises are using, because the state did requirements for disaster recovery ability of banks, assets of more than how much must be done two to three centers architecture to ensure stability of the banking system.

So there is this architecture without its disadvantages? Let's consider its availability high? Availability means that fast enough when the infrastructure to handle user requests?
We found that between this architecture, the center is the need for data backup, then the backup data for only two ways, either asynchronous or synchronous.

  • Maximum performance mode: If it is asynchronous, represents a write user data requests, as long as it will return results directly in the production data center storage complete data to the user, while asynchronous data to back up, however, if you are ready to go when the production of asynchronous data backup data center power outage ~, then this time could be exposed to the disaster recovery server to provide services to users? Can not, because the data disaster recovery center is likely to be obsolete data.
  • Maximum Protection mode: If it is synchronized, represents a write user data requests, only to wait for production data center to store the data, but also need to wait for other post-disaster recovery center to return complete data backup and disaster recovery center only when there is a problem, because backup data can not be completed, so the entire infrastructure can not provide services, such availability is very low.
  • The maximum available mode: This is a commonly used scheme, using maximum protection mode under normal conditions, while monitoring the production data center disaster recovery data center, a disaster recovery center if it is found there is a problem, it will be changed to the maximum performance mode, so to ensure that the production is not affected by other data center disaster recovery center.
  • Three two synchronous write: This is before Ali architecture model, meaning three city centers, data backup did not occur at the database level, but the application layer, when the application to write data to the database, will also write to the three centers data, as long as there are two centers can successfully return, so that even if three centers have a central power outage, then the whole structure does not affect the availability of this idea and we are not the same as the first three, the performance will certainly be much higher .

Well, we introduced about two to three centers, to sum up its disadvantages:

  1. Disaster recovery center utilization is not high
  2. After stops production data center, the data disaster recovery center is not necessarily exactly the same 100%
  3. High cost, but can not really achieve the desired high availability capabilities

So in order to solve this problem, there have been three five centers , although similar names and two three centers, but provides functionality completely different.
Three five centers refers to three cities, five centers, three five centers concept is based on a unit, had to spend a lot of space is concerned, next to continue it.

I believe we do not like in a small cell phone screen also see a chunk of code reading experience, so I'm writing style will be a little above normal text. If you give a little thought to gain something like it.

Guess you like

Origin juejin.im/post/5cf4e4086fb9a07ee1691369