Theoretical Basis of Microservice Architecture - Conway's Law

Abstract: A fact that may surprise many people is that many of the core concepts of microservices were actually explained in an article half a century ago, and many of the arguments in this article are in the rapid development of software development. In the past half century, it has been repeatedly verified, and this is Conway's law.

Haste is not enough, if you want to be haste, is haste!

I. Overview

Microservices is a very hot new concept recently. Everyone is chasing it, and they all think it is right, but it seems that there is no sufficient theoretical basis to explain that this is correct, and it gives people the feeling of being ignorant. Some time ago, I read Mike Amundsen's "Conway's Law under Long-Distance Conditions - Realizing Team Building in a Distributed World" (the author of Design RESTful API) on InfoQ. I found it very helpful, and combined some of my own thinking , organized the content of the speech.

A fact that may surprise many people is that many of the core concepts of microservices were actually explained in an article half a century ago, and many of the arguments in this article are in this rapid development of software development. It has been proven again and again for half a century, and this is Conway's Law.

In Conway's article, the most famous line is:

Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations. - Melvin Conway(1967)

The Chinese literal translation roughly means: the organization of the design system, the design produced by it is equivalent to the communication structure within the organization and between the organizations.

Take a look at the picture below, and then think about Apple's products and Microsoft's product design, and you can vividly understand this sentence.

In layman's terms: organizational form is equivalent to system design.

The system here is not limited to software systems in the sense of the original author. It is said that this article was originally submitted to the Harvard Business Review, but the programmer Diaosi's article was not in the eyes of business people and was ruthlessly rejected. Conway submitted it to a programming-related magazine, so it was misunderstood as being aimed at software development. At first, this article obviously did not dare to call itself a law, but only described the author's own findings and conclusions. Later, in Brooks Law's famous man-month myth, this argument was quoted and "touted" as what we now know as "Conway's Law."

Second, Conway's Law in detail

Mike summarizes some of the other core points in this paper from his perspective as follows:

first law

Communication dictates design
The way the organization communicates will be expressed through the system design

second law

There is never enough time to do something right, but there is always enough time to do it over
No matter how much time there is, one thing cannot be done perfectly, but there is always time to finish one thing

third law

There is a homomorphism from the linear graph of a system to the linear graph of its design organization
Potentially Heteromorphic Properties Between Linear Systems and Linear Organizational Structures

Fourth Law

The structures of large systems tend to disintegrate during development, qualitatively more so than with small systems
Large system organization is always more prone to decomposition than small systems

3. Explanation of the law

first law

Humans are complex social animals

The close link between organizational communication and system design is similarly expounded in many other areas. For complex systems, talking about design is inseparable from talking about communication between people. Only by solving the problem of communication between people can we have a good system design. I believe that almost every programmer has read the "Myth of the Moon" (in 1975, it feels like an antique, and the classic is to stand the test of time), many of which are similar to this sentence.

For example, the most famous sentence in "The Myth of the Moon and the Moon" is

Adding manpower to a late software project makes it later --Fred Brooks, (1975)

Did the bosses hear it? In order to catch up with the progress, adding programmers is like extinguishing the fire in the oil pan with water (but everyone still keeps going).

Why? The man-month myth also gives a very concise answer: communication cost = n(n-1)/2, and the communication cost increases exponentially with the number of people in the project or organization. Yes, the complexity of the project management algorithm is O(n^2). for example

For a project team of 5 people, the communication channel is 5*(5–1)/2 = 10
For a project team of 15 people, the communication channel is 15*(15–1)/2 = 105
For a project team of 50 people, the communication channel is 50*(50–1)/2 = 1,225
For a project team of 150 people, the communication channel is 150*(150–1)/2 = 11,175

So you know why Internet startups are so small, they must be small, otherwise, after the CEO and everyone talk about the idea of starting a business, the venture capital money will be burned out.

Mike also cited a very interesting theory called "Dunbar Number", which was first proposed by a biologist named Dunbar (nonsense) in 1992. Initially, he found a correlation between primate brain size and its corresponding population size, leading to some interesting estimates of the relationships that the human brain can maintain. for example

Intimate friends: 5
Trusted friends: 15
Close friends: 35
Casual friends: 150

Does it seem to be related to the communication cost figures above? Yes, our brain intelligence can only support so many relationships. (Everyone knows that this is not the field that programmers are good at. In the development team, this value should be smaller, and it is estimated to be similar to ape-_-convex)

Communication problems will bring about system design problems, which in turn will affect the development efficiency of the entire system and the final product results.

second law

If you can't eat fat in one go, you can do it first

Eric Hollnagel, one of the leading figures in the agile development community, explains a similar argument in his book Efficiency-Effectiveness Trade Offs.

Problem too complicated? Ignore details.

Not enough resources?Give up features.

--Eric Hollnagel (2009)

The system is becoming more and more complex, with more and more functions, the competition in the external market is becoming more and more intense, and the expectations of investors are getting higher and higher. But there is an upper limit to human intelligence, and no matter how powerful a person is, no matter how much money they raise, they may not necessarily recruit enough suitable people. For a hugely complex system, we can never fully consider it. Eric believes that the best solution at this time is to "break the jar and break it".

In fact, we often encounter them in daily development. Product manager needs are too complex? Appropriately ignore some details and focus on the main line first. Too many product managers in demand? Give up some features.

It is said that Eric was hired by an airline as a safety consultant to ensure the stability and safety of the aircraft flight system. Eric believes that there are two ways to be safe:

General security refers to finding and eliminating as many faulty parts as possible to achieve absolute security, which is ideal.
The other is elastic security. Even if an error occurs, as long as it is restored in time, it can work normally. This is a reality.

For a complex system such as an aircraft, no matter how powerful people are, they can't take into account all aspects of the loopholes, so Eric suggested to give up the idea of building a perfect system, but to find problems through continuous test flights, and ensure that when problems occur, the system can automatically recover that is Yes, instead of pursuing the absolute correctness and safety of the flight system.

The following diagram nicely explains the process:

Sounds familiar, doesn't it? Isn't that what continuous integration and agile development are all about? Indeed it is.

On the other hand, this is also true of the elastic design of distributed systems maintained by Internet companies. For a distributed system, it is almost never possible to find and fix all the bugs, 1000% coverage of unit tests is useless, bugs are in the blood of the distributed system. The solution is not to eliminate these problems, but to tolerate them. When problems occur, they can be automatically repaired. In a system composed of microservices, each microservice may fail. This is the norm, as long as we have enough redundancy and backup That's it. The so-called elastic design or high-availability design.

third law

Sow melons, be independent and autonomous subsystems to reduce communication costs

This is a concrete application of the first law of the intrinsic relationship between organization and design. To put it more bluntly, you can build what kind of team you want. If your team is divided into front-end team, java back-end development team, DBA team, operation and maintenance team, your system will look like the following:

On the contrary, if your system is divided according to business boundaries, and everyone makes their own modules into small systems and small products according to a business goal, your large system will grow into the following, that is, the architecture of microservices

The teams of microservices should be inter-operate, not integrate. Inter-operate is to define the boundaries and interfaces of the system, in a full stack within a team, let the team be autonomous, the reason is because if the team is formed in this way, the cost of communication is maintained within the system, and each subsystem will be more internal When they gather together, the mutual dependency coupling becomes weaker, and the communication cost across the system can also be reduced.

Fourth Law

long-term must divide, long-term must be divided

As mentioned earlier, people are complex social animals, and the passage between people is very complicated. But when we face complex systems, we can often only solve them by adding manpower. At this time, how does our organization generally solve this communication problem? Divide and conquer, divide and conquer. Let's take a look at the organization of your company. Is it true that a front-line manager usually manages less than 15 people? Second-line managers manage fewer first-line managers? The third line manages less, and so on. (There is absolutely no suggestion here that development managers are harder to manage than programmers)

Therefore, a large organization is always split into small teams due to communication costs/management issues.

The idea of starting a business is too good, anyway, there is a lot of venture capital, and more programmers are recruited
I can't manage too many people. I'll find a few managers to help me. I'll take care of the manager.
Finally, Conway's Law tells us that the way of organizational communication will be expressed in system design. Each manager is given certain responsibilities to do a certain part of the larger system, and they have a communication boundary with the larger system. Therefore, large systems will also be split into small systems responsible for small teams (microservices are a good model)

4. How Conway’s Law Explains the Reasonableness of Microservices

1. After understanding what Conway's law is, let's see how he laid the theoretical foundation of microservice architecture half a century ago.

(1) The communication between people is very complicated, and the communication energy of one person is limited, so when the problem is too complicated and needs many people to solve, we need to split the organization to manage the communication efficiency.

(2) The communication method between people in the organization determines the system design they participate in. Managers can bring different communication methods between teams through different split methods, thereby affecting the system design.

(3) If the subsystem is cohesive, and the communication boundary with the outside is clear, the communication cost can be reduced, and the corresponding design will be more efficient.

(4) Complex systems need to be continuously optimized in a fault-tolerant and elastic manner. Don’t expect a large and comprehensive design or architecture. Good architectures and designs are slowly iterated.

2. The specific practical suggestions brought

(1) We use all means to improve communication efficiency, such as slack, github, wiki. If two people can explain things clearly, don’t draw more people. Everyone has a clear division of labor in each system. If there is a problem, you will know who to turn to immediately to avoid the problem of kicking the ball.

(2) Design the system through the MVP method, and verify the optimization through continuous iteration. The system should be designed elastically.

(3) What kind of system design you want, what kind of team you want to build, and flatten it if it can be flattened. It is best to divide the team according to the business, so that the team can be naturally autonomous and cohesive, and the clear business boundary will reduce the cost of communication with the outside world. Each small team is responsible for the entire life cycle of its own module, and there is no unclear boundary. There is no invalid wrangling, inter-operate, not integrate.

(4) To be a small and beautiful team, many people will bring communication costs and reduce efficiency. Amazon's Bezos has a funny analogy, if 2 pizzas aren't enough for a team, then the team is too big. In fact, a small product team of an Internet company is about 7 or 8 people (including front-end and back-end testing, interactive research, etc., which may hold several positions).

3. Corresponding to the standards for measuring microservices, we can easily find the close relationship between them

A system of distributed services
Divide your organization by business, not technology
Do a product with a statement, not a project
Smart endpoints and dumb pipes (my understanding is strong service individuals and weak communication)
Automated operation and maintenance
fault tolerance
rapid evolution

For more exciting blogs, please pay attention to the public account of "Su Xiaonuan"

Theoretical Basis of Microservice Architecture - Conway's Law

Guess you like