36 | Business status and storage middleware

Compared with desktop programs, the basic software that server programs rely on is not only the operating system and programming language, but also two more categories:

Load Balance;

Database or other form of storage (DB/Storage).

What is the role of storage in server-side development? Today we will talk about storage middleware.

business status

Let's start from the beginning.

First, let’s think about a question: What are the similarities and differences between desktop programs and server programs? For such an open question, different of us may have very different answers.

Today let’s look at this issue from a data perspective.

We know that a desktop program is basically driven by a series of "user interaction events". You can understand it as a state machine: Assume that at time i, the state of the desktop program isbusiness state i. After it receives the user interaction event i, the state changes to business state i+ 1 . The process is illustrated as follows:

The state transition diagram is expressed as follows:

So, what about the server side?

If you think about it carefully, you will find that server-side programs can actually be viewed with exactly the same model. It's just that it is not driven by "user interaction events", but by "network API requests".

You can also understand it as a state machine: assuming that at time i, the state of the server program is business state i , After it receives network API request i, the status changes to business status i+1. The process is illustrated as follows:

The state transition diagram is expressed as follows:

So, what is the difference between desktop programs and server programs?

The biggest difference between them is the different representation of business status.

How is the business status of the desktop program represented? In-memory data structures. As we mentioned in the previous chapter, the Model layer of a desktop program is a DOM tree, and the root node is usually called Document. This DOM tree is actually the business status of the desktop program.

How to express the business status of the server program? Is it possible to use an in-memory data structure?

The answer is of course no. If the business status is in the memory and the server program hangs, the data will be lost.

We mentioned before in "34 | Macroscopic Perspective of Server-side Development":

The domain characteristics of the server are large-scale user requests and 24-hour uninterrupted service.

This sentence is the core of understanding the server-side architecture and is crucial. But in a sense the more important principle is:

The user's data must not be lost, that is, the business status that he thinks has been completed.

The server is a black box for the user. Since the user receives feedback that a certain "network API request" is successful, he will consider the success to be confirmed.

Therefore, the server must ensure the reliability of its business status. This is different from desktop programs, which often require explicit user interaction events, such as the Ctrl+S command, to complete the data saving operation. Only then will the business status be persistently written to external memory. And for most desktop programs, it doesn't need to support persistence.

Storage middleware and disaster recovery level

In the absence of storage middleware, the server needs to persist the business status after responding to each network API request.

Doesn't this sound complicated?

In fact, this is not the case. The difficulty of persisting the business status of server-side programs is much higher than that of desktop programs. For the same reason, the desktop program is for single-user use and does nothing else during persistence. It seems that the user experience is acceptable.

But for server-side programs, if we can only wait in line when an API request is completed and persisted, the throughput capacity of the service will be too poor at best; at worst, During the period of persistence execution, the server stops serving in the eyes of the user. Therefore, the persistence time must be short enough, so short that no one can detect the service pause.

The business status of the server program is not simple. This is a multi-tenant persistent state. Even if one user's business status data is only 100K, and there are 1 million users, there will be 100G of data that needs to be persisted. This obviously cannot be achieved with the persistence idea of ​​"regular desktop programs completely regenerate a new file each time". It needs to be designed as an incremental storage system.

If every developer working on server-side programs needs to consider how to persist the business state, the cost is obviously too high.

As a result, storage middleware came into being.

Historically, the first storage middleware was a database, which appeared in 1974 as the IBM System R.

This was the year the Internet had just been invented. Therefore, the background of the birth of the database is probably to serve workstations, which can also be considered as a network service.

Desktop programs rarely use databases. Only some scenarios that require incremental persistence of business status will be used, the more typical one is WeChat. WeChat's local chat records should be stored based on a database, but it uses an embedded database, such as SQLite.

In the early days, people did not have high requirements for the disaster recovery level of storage middleware. The databases are all stand-alone versions, with no master-slave. People's demands for storage middleware are high performance, stable, and proven. How to ensure the reliability of data? Just choose a low-peak period in the evening to make an offline backup of the database and that's it.

For server-side development, the emergence of the database is revolutionary, and it greatly improves development efficiency.

But in terms of disaster recovery level, with the popularity of the Internet, our requirements for it are getting higher and higher.

First of all, a single-machine database is not enough. Multiple machines need to be hot-standby for each other. This is the origin of the master-slave structure of the database. In this way, we do not need to worry about a single database machine failure causing the service to be temporarily inaccessible or even causing more serious data loss.

Secondly, a single-machine database is not enough. There is an upper limit on the storage capacity of a single machine, so there is an upper limit on the number of users we can serve. Before the emergence of distributed databases, people's solution was to manually divide databases and tables. In short, we need to achieve scalability in business without worrying about the limitations of the physical storage capacity of a single machine.

Finally, the reliability of a single computer room is not enough. Network interruptions may occur in the computer room. In extreme cases, data in the entire computer room may be lost due to natural disasters, such as earthquakes. As a result, the "two places and three centers" data disaster recovery plan for cross-machine room disaster recovery emerged.

Storage as data structure

So the question is, can the database solve the business state persistence needs of all server-side programs?

The answer is of course no.

Compared with desktop programs, we can know that business status is actually a data structure. Although the data structure of the database is indeed very versatile, it is not a silver bullet and is not applicable in many situations. Storage is a data structure.

What is storage middleware? Storage middleware is the "metadata structure".

The logic of this conclusion lies in the following aspects.

First of all, unlike desktop development, the data structures on the desktop are basically memory-based and are less difficult to implement. But it's different on the server side. Every time we change the business status, we need to consider persistence, so the core data structure of the server is based on external memory.

Secondly, the data structure of the server has extremely high requirements on stability and concurrency performance (IOPS). A simple analysis shows that the scalability of the server program depends entirely on the scalability of the storage.

Business servers are often stateless. It is very easy to add a new business server when the pressure is high. However, when the storage pressure is high, you cannot simply add a machine, which may involve repartitioning and relocation of data.

This means that it is very difficult to implement a data structure on the server side. Let's take a very simple example. It is very easy to implement a KV storage in memory. Many languages ​​have data structures such as Dictionary or Map to do this. Even if we don’t use a library, it is very easy to implement it by ourselves if we spend dozens of minutes or an hour.

However, a server-side KV storage is very complex and cannot be done by one person in a day or two. Even if it is done, no one dares to put it into use immediately. It needs to be verified in all aspects through a very large number of test cases before it can be put into production environment. Moreover, even if we dare to put it into the production environment, in order to have a comprehensive strategy, we often adopt the "double write" method at the beginning: using a mature storage system and our newly launched storage at the same time.

Quality control of storage systems is crucial.

Precisely because it is so difficult to implement the data structure on the server side, for the server side, all data structures involved in the business need to be abstracted and become a storage middleware.

How much storage middleware will there be?

This is related to the model abstraction developed on the server side. Today, there is no relatively systematic theory to tell you that with such data structures, it is complete. But from a longer-term development perspective, we may well need to answer this question.

Therefore, storage middleware is a "metadata structure".

The "metadata structure" mentioned here is a word I invented myself. What it expresses is that the types of data structures are very limited, and the best theory can be proven that with such basic data structures, all business requirements can be efficiently realized. These basic data structures are what I call "metadata structures".

What storage middleware are we exposed to today? An incomplete list is as follows:

Key-value storage (KV-Storage);

Object Storage;

Database;

Message Queuing (MQ);

Inverted index (SearchEngine);

etc.

Currently, the types of storage middleware are not enumerable. But it's probably just limited by my own knowledge, and maybe one day we'll be able to find a more perfect answer to this question.

Conclusion

Today we started with the business status of desktop programs and server programs and discussed the origin of storage middleware.

We mentioned before in "34 | Macroscopic Perspective of Server-side Development":

The domain characteristics of the server are large-scale user requests and 24-hour uninterrupted service.

This sentence is the core of understanding the server-side architecture and is crucial. But in a sense the more important principle is:

The user's data must not be lost, that is, the business status that he thinks has been completed.

Storage is a data structure. Storage middleware is the "metadata structure".

For the server side, storage middleware is crucial. Not only does it greatly liberate production efficiency, it is also the performance bottleneck on the server side. Almost all server-side programs cannot withstand the pressure, often because the storage cannot withstand the pressure.

Guess you like

Origin blog.csdn.net/qq_37756660/article/details/134974562
36