Go's task scheduling unit and concurrent programming

Abstract: This article was originally published on CSDN by the technical team of Grape City. Please indicate the source of the reprint: Grape City official website , Grape City provides developers with professional development tools, solutions and services to empower developers.

foreword

This article mainly introduces the background of Go language, process, thread, and coroutine, and how Go language solves the problem of coroutine and the realization of concurrent programming. The reading time of this article is about 15-20 minutes. Please allocate your reading reasonably time.

1. Past and present life of Go

1.1. The process of the birth of Go language

It is said that as early as one day in September 2007, Google engineer Rob Pike started the construction of a C++ project as usual. According to his previous experience, this construction should last about 1 hour. At this time, he and two other Google colleagues, Ken Thompson and Robert Griesemer, began to complain and expressed their idea of ​​​​a new language. At that time, Google mainly used C++ to build various systems internally, but the complexity of C++ and the lack of native support for concurrency made the three big bosses very distressed.

The small talk on the first day was fruitful, and they quickly conceived a new language that could bring joy to programmers, match future hardware development trends, and satisfy Google's internal large-scale network services. And on the second day, they met again and began to seriously conceive the new language. After the meeting the next day, Robert Griesemer sent the following email:

It can be seen from the email that their expectations for this new language are: **On the basis of the C language, modify some errors, delete some criticized features, and add some missing functions. **Such as repairing the Switch statement, adding an import statement, adding garbage collection, supporting interfaces, etc. And this email became the first draft of Go's design.

A few days after this, Rob Pike came up with the name Go for the new language on a drive home. In his mind, the word "Go" is short, easy to type and can easily be combined with other letters after it, such as Go's tool chain: goc compiler, goa assembler, gol linker, etc., and this word also fits Their original intention for the language design: simple.

1.2. Step by step forming

After unifying the design ideas of Go, the Go language officially started the design iteration and implementation of the language. In 2008, Ken Thompson, the father of the C language, implemented the first version of the Go compiler. This version of the Go compiler is still developed in the C language. Its main working principle is to compile Go into C, and then Compile C into binaries. By mid-2008, the first version of Go's design was largely complete. At this time, Ian Lance Taylor, who also worked at Google, implemented a gcc front-end for the Go language, which is also the second compiler of the Go language. This achievement of Ian Taylor is not only an encouragement, but also a proof of the feasibility of Go, a new language. With a second implementation of the language, it is also important to establish Go's language specification and standard library. Subsequently, Ian Taylor officially joined the Go language development team as the fourth member of the team, and later became one of the core figures in the design and implementation of the Go language. Russ Cox is the fifth member of the Go core development team, also joining in 2008. After joining the team, Ross Cox cleverly designed the HandlerFunc type of the http package by taking advantage of the feature that the function type is a "first-class citizen", and it can also have its own methods. In this way, we can make an ordinary function a type that satisfies the http.Handler interface through explicit transformation. Not only that, Ross Cox also proposed some more general ideas based on the design at that time, such as the io.Reader and io.Writer interfaces, which established the I/O structure model of the Go language. Later, Ross Cox became the head of the Go core technical team, promoting the continuous evolution of the Go language. At this point, the initial core team of the Go language is formed, and the Go language has embarked on a path of stable evolution.

1.3. Official release

On October 30, 2009, Rob Parker gave a speech on the Go language at Google Techtalk, which was the first time that the Go language was made public. Ten days later, on November 10, 2009, Google officially announced that the Go language project was open source, and this day was officially designated by Go as the birth day of the Go language.

(Go language mascot Gopher)

1.4. Go installation guide

1. Go language installation package download

Go official website: https://golang.google.cn/

Just select the corresponding installation version (it is recommended to select the .msi file).

2. Check whether the installation is successful + whether the environment is configured successfully

Open the command line: win + R to open the run box, enter the cmd command, and open the command line window.

Enter go version on the command line to view the installed version. If the following content is displayed, the installation is successful.

2. Process, thread and coroutine

In the era of the Internet cloud, almost all named components are developed with Go, such as Docker, Kubernetes, Dapr, etc., are simply too numerous to list, but it is impossible to capture those big manufacturers with only these simple grammars. of heart. Go can be so hot in the Internet age, it must have its own foundation, yes, it is Go's concurrent programming function, Go language's biggest killer Goroutine .

Goroutine is a compound word, taken from the original word Coroutine, which means coroutine .

Goroutine is an implementation of coroutine in Go, so what exactly is coroutine? Why has it become popular in recent years? Isn't it possible to do web development without the concept of coroutines in C#? Why are big companies devoting themselves to Go with coroutines from popular languages ​​such as Java? With the above questions in mind, I started to understand what a coroutine is from the beginning.

2.1. Emergence of processes

I have heard of the concept of coroutines for a long time, and I have tried to understand what this coroutine means several times. However, due to the lack of pre-knowledge, it is impossible to form a useful knowledge chain. Understand it and forget it. So I think that we should understand the birth of the concept of coroutine and why it is so hot now, so that we can finally turn knowledge into our own experience.

Because we have to start from the beginning, I think we should start with the concept of why there is a process in the first place.

It is said that a long time ago, the computer was a single-channel batch processing machine. The programmer (or it can be called a hole puncher) typed the program he wrote into the computer through a paper bag, and the computer would eventually return the result to the computer after the calculation. user. In this period, there is actually no concept of process.

With the continuous development of technology, the computer has gradually evolved from the most primitive form to a multi-channel batch processing system. At this time, the computer can already execute multiple programs concurrently, and the concept of operating system has also appeared. At this time, people found that using the concept of program alone can no longer successfully describe a program that is being executed in the computer, because a program can be executed concurrently multiple times, so for the computer, the programs that have the same code but are executed concurrently are separated. Represents different programs, therefore, the clever mind invented the concept of process.

At that time, a process was the only identifier for the existence of a program. It was not only the execution scheduling unit of the program, but also the storage unit of program information. Each process had a process control block PCB, which stored some information about the process. For example: page tables, registers, program counters, heaps, stacks, etc.

In addition, processes are isolated from each other. They have their own independent PCB and memory, etc. Several processes do not interfere with each other and run completely independently in the computer system.

During this period, everything was running perfectly without incident.

2.2. The emergence of threads

However, with the further development of technology, people found that some problems cannot be solved by using processes alone, such as playing an MP3 audio file.

The pseudocode for playing an MP3 audio file is roughly as follows:

main() {
    
    

while(true) {
    
    

/*IO读取音频流*/

Read();

/*解压音频流*/

Decompress();

/*播放*/

Play();

	}

}

Read() {
    
    ...}

Decompress() {
    
    ...}

Play() {
    
    ...}

There is a very serious problem in the execution of this code in the process: the three functions cannot be executed concurrently. Because when the Read function is called, the user mode code issues a system call to perform IO operations. For IO-intensive operations, the operating system usually directly blocks its process. When the IO operation is completed and triggers an interrupt, the operating system will activate the previously blocked process to continue execution.

Then there is a problem. If I haven't read all the files and decoded them, my program can't play the audio files. The direct consequence to the user is that the sound played is segment by segment. Why is it segment by segment? Because the IO operation has a buffer, it may only IO the data of one buffer each time. According to the logic of the program, the data in the buffer will be decoded and then played. Then IO... Then what the user hears is a piece of music.

Is it possible to implement it with multiple processes? It is one process IO, one process decoding, and one process playing. This looks like a solution, but there is still a problem, namely inter-process communication. In the previous section, we learned that the memory between processes is isolated from each other, and the three processes cannot directly access each other's content. This requires the IO process to be executed, and it is necessary to find a way to tell its own data to the decoding process. Not to mention that the performance consumption of inter-process communication is huge, and it is actually difficult for the three processes to work together perfectly.

So we know that a single process cannot execute multiple different programs concurrently under the premise of sharing memory. Therefore, the concept of thread appeared.

In order to solve the above problems, the smart brain split the resource management module and scheduling module of the process in more detail, and created the concept of thread. At this time, the process is still the control center of all resources of a program, but the execution of the program is no longer done by the process, but by multiple internal threads. These threads share memory, and can be scheduled by the CPU to execute multiple different programs concurrently. The above-mentioned problems are perfectly solved by threads.

Although the threads in the process share memory, the execution of the threads is independent of each other, so each thread needs to have its own registers, program counters, stacks and other resources. Therefore, the same as the process control block PCB, the thread also has its own thread control block TCB to record some of the above-mentioned exclusive resources. The model of process and thread is as follows:

2.3. Internet Era

The concept of threads has been running smoothly until the Internet era. At this time, new problems have emerged:

With the rapid development of the Internet now, high concurrency is already a problem that every Internet company must face, because only with high concurrency can there be traffic, and only with traffic can the foundation of their own business be established. In the era of high concurrency, it is difficult for threads to meet the demand.

If a server can have up to 10,000 concurrent requests per second, then the corresponding server needs to open at least 10,000 threads to serve these concurrent requests. The creation of threads also requires resources. Taking Linux as an example, the creation cost of a POSIX Thread ranges from 1-8MB . Then 10,000 requests in 1 second need to consume 10-80GB of memory resources in 1 second. This amount It is very scary.

Moreover, CPU scheduling from one thread to another requires thread context switching, which is also a performance loss point. What is a context switch? As we said above, the thread also has its own TCB to record its own stack and program counter registers. When the operating system performs thread scheduling, it needs to load all the above-mentioned resources from the TCB of a thread into the registers and memory executed by the CPU before execution. When the time slice of one thread ends and switches to another thread, the operating system also needs to record the final resource information of the previous program back into the previous thread TCB, and then load the resources in the TCB of the new thread. This process is called thread context switching , and the overhead is generally around 3-5us .

In addition, most of the current Internet requests need to read data and return it to the user for display. The reading of data requires IO operations. During IO operations, the operating system will block the corresponding threads. Therefore, some people have done tests. In the case of high concurrency, 80% of the threads are actually blocked, and they only occupy resources. If there is no work, it takes up system resources in vain. And if the memory is insufficient, the operating system may suspend the process, thereby frequently triggering page fault interrupts, putting greater pressure on the already insufficient IO bandwidth, and forming a more serious vicious circle.

The early web server Apache handled web requests through a multi-threaded response model. But now almost no one uses the Apache server, because that model cannot solve the above problems.

At this time, the clever mind thought of a new solution: IO multiplexing technology. Nginx uses this technology to handle high concurrent requests. So what is IO multiplexing?

IO multiplexing is different from opening a thread for one request. Servers such as Nginx listen to all web requests through an infinite loop thread. When a request comes, some IO multiplexing technologies of Linux select, poll, epoll, and kqueue can monitor multiple file descriptors at the same time through a blocked system call. Once any file descriptor is ready for read and write operations, the program will be notified to perform corresponding processing, thereby achieving efficient event-driven programming. Moreover, the execution threads of these requests are non-blocking IO operations, that is, they do not wait for IO operations, but stop to do other things, which greatly reduces the pressure on the server.

But this has a new problem, that is, under the IO multiplexing technology, the response of multiple requests is an event callback mechanism, and it is difficult for programmers who process these programs to find the timing of the callback, which makes the program Developers add infinitely high mental stress, and the code is very difficult to write.

2.4. Emergence of coroutines

In order to solve the above problems, our protagonist coroutine appeared.

The essence of a coroutine is actually a user-mode thread . Here comes a new concept, what is user mode? All CPU instruction sets currently on the market are actually divided into levels. Generally, they are divided into four levels: ring0~ring3. The closer to ring0, the greater the authority of the CPU instruction obtained, and the more things that can be done accordingly. But The corresponding risk of insecurity is higher. From this we can know that the user mode and the kernel mode are actually completely isolated .

Therefore, the current operating system is divided into user mode and kernel mode. Taking Linux as an example, ring0 is the kernel mode, and ring3 is the user mode. All the programs we develop daily are user-mode programs. In user-mode programs, we can only operate a small part of the functions of the computer. Most of the functions, such as IO read and write, memory allocation and various hardware interactions, etc., are performed by the kernel program. Completed.

At this time, some students must have asked, hey, it’s not right, my code can also read files and write various information on the screen? How is that possible?

These functions are actually implemented by sending various system calls from user-mode programs to kernel-mode programs. Sending a system call will trigger a context switch from user mode to kernel mode, which will also cause a performance loss.

Last year, there was a two-face interview question for Alibaba on the Internet, asking why RocketMQ and Kafka are so fast? In fact, it is because RocketMQ and Kafka both use a zero-copy technology mmap in linux when performing IO operations, so that the memory copy caused by a system call switch is reduced during the data reading and writing process, but mapped to the same memory area , so as to achieve the acceleration. Students who are interested in this issue can go to see what mmap is .

Therefore, before the concept of thread appeared, the concept of user-mode thread had already appeared, that is, the user-mode program internally simulates multi-thread scheduling, and the operating system only schedules the corresponding process, but does not perceive the corresponding process. The direct benefit brought by threads and coroutines is that there is no need to create TCBs , which can save the corresponding memory overhead when creating threads.

But if you look at it this way, coroutine is another technology like Docker that puts new wine in old bottles? Really not.

Although the early user-mode threads have certain performance advantages, they still cannot solve a problem: they cannot perceive system interruptions . We know that the current operating system is preemptive , and the operating system will give priority to the execution of high-priority programs by default. If a low-priority program is currently being scheduled, the operating system will trigger an interrupt to let the high-priority program preempt it. Preemption is realized by the operating system by sending system interrupts. However, the operating system cannot perceive the existence of the coroutine, so the coroutine itself cannot handle the preempted interrupt event. In addition, if a user-mode thread performs IO operations, the operating system will block the entire thread, and the corresponding coroutines that do not call IO will also be blocked. Finally, since the scheduling unit of the operating system is a process, fewer time slices are allocated to coroutines each time, so CPU computing power is also a problem that needs to be solved. It is also because of these problems that each operating system further introduces its own system-level threads.

3.Goroutine

Goroutine is actually a higher-level encapsulation made by Go to solve the above-mentioned common coroutine problems.

Rob Pike, the author of Go, describes Goroutines like this: A Goroutine is a Go function or method that runs in parallel with other Goroutines in the same address space. A running program consists of one or more Goroutines. It is different from threads, coroutines, processes, etc. It is a Goroutine.

For the early coroutines mentioned in the previous section, the basic and thread matching models are N:1 , that is, one thread needs to maintain multiple coroutines at the same time. Such a construction model cannot solve the above-mentioned problems. For this reason, Go provides a GM model in the language (later gradually evolved into a GMP model). Generally speaking, it is to make user-mode threads, namely Goroutine (G) and real thread Machine (M), become an N:M model, as follows As shown in the figure:

It can be seen that Goroutine is attached to the system thread to run. They are uniformly managed by Go Runtime, so the core of Go is actually its runtime, which manages the creation and destruction of Goroutines in a unified way, and will uniformly allocate memory to them, respond to system calls, etc., including memory management, process Management, the main function of device management, and the real operating system has only such functions. It can be said that go runtime is already a small operating system .

There is a big guy on Github who is a Go engineer of Baidu. He wrote a small operating system eggos in Go, and successfully installed it on a bare metal machine without an operating system, and he can also play Super Mario in it. Interested students can go to learn and understand, this kind of thing is almost impossible to achieve on the .net platform.

The above figure is the early (Go1.2 version) Go scheduling model for Goroutine. The newly created Goroutine in our program is actually added to the global queue at the beginning, and then the real executor M of the program will pull out the Goroutine to be scheduled from this queue to run. When the running Goroutine triggers system scheduling such as IO, the runtime will move it back to the global queue. Similarly, if a new Goroutine is created in the running Goroutine, it will also put the Goroutine into the global queue for scheduling. In addition, the runtime will also start a monitoring thread to monitor these running Goroutines. If the specified time slice is exceeded, these Goroutines will be moved back to the global queue.

It can be seen that there are actually many problems in the GM model, such as the uniform use of a global lock, the scheduling of Goroutines depends on the global queue, and the lack of strong dependencies between program executors and Goroutines, which in many cases does not satisfy the principle of locality. The memory allocation of M and extensions etc. Therefore, the Go team further evolved to the GMP model in the later stage, adding a Processor, and interested students can learn to understand the GMP model.

In summary, compared with threads, Goroutine has the following advantages:

  1. Very low initial resources. The creation cost of a Goroutine is directly reduced from the 1-8MB of the thread to the default 2KB . Since there is no need to create a TCB, the Goroutine only needs to create a pointer to the program counter to record the location of the function stack it is currently running.
  2. There is almost no context switching overhead. A Goroutine is scheduled by GoRuntime, and the scheduling overhead is entirely the operation of enqueueing and dequeueing, and there is no need to switch contexts. Compared with the 3.5us of the thread, the switching overhead of the Goroutine is about 100ns.
  3. The running thread will not be blocked. When a Goroutine issues a system call, it will be directly transferred from M by the runtime, and the corresponding M will continue to execute a new Goroutine program, which greatly improves thread utilization.

At the same time, combined with IO multiplexing technology and runtime scheduling, it solved some serious problems of early coroutines, thus successfully breaking out from the Internet era and becoming the main language of various major factories and underlying components.

4.talk is cheap

Well, knowing the various advantages of Goroutine, let's finally take a look at how a Go's concurrent programming model is implemented.

func say(s string) {
    
    

for i := 0; i < 5; i++ {
    
    

time.Sleep(100 \* time.Millisecond)

fmt.Println(s)

}

}

func sayHello() {
    
    

go say("hello")

say("world")

}

Here's an example from the Goroutine section of the official Go documentation. It can be seen that the syntax of running a Goroutine is very simple, and only one go keyword is needed . The above example will eventually output "hello world". Unlike the contagiousness of C#, Go code is completely invisible from the outside whether the code is implemented asynchronously, which reduces some psychological pressure on developers.

We know that the complex async/await model designed by C# is actually to solve the problem that the asynchronous method callback is difficult to obtain. Therefore, the await keyword is added to monitor the result of the asynchronous state machine, and finally return the result in the context of the asynchronous thread. However, Go does not have await, so how to perform context synchronization?

func calN(n int, ch chan<- int) {
    
    

// 模拟复杂的N计算

time.Sleep(time.Second)

// 返回结果

ch <- n \ n

}

func main() {
    
    

ch := make(chan int, 1)

go calN(12345, ch)

select {
    
    

case ans := <-ch: {
    
    

fmt.Printf("answer is: %d", ans)

}

default {
    
    

time.Sleep(time.Millisecond \ 100)

}

}

}

Here we finally learned the last keyword select and the last reference type chan.

chan

Let’s talk about chan first. Chan means channel, which is a channel for multiple goroutines to transmit data. Its function is similar to Pipe in C#, which is equivalent to digging a hole in multiple concurrently executing goroutines to transmit data.

Like a pointer, chan is a reference type , initialized by the make() function, and the second parameter is the size of the channel. It can be seen that the channel is actually a double-ended queue , and multiple goroutines can read and write data to the channel. When the channel buffer is full, the goroutine writing the channel will be blocked .

The syntax for writing to a channel is very simple: an arrow symbol <- (note that there is only one type of left arrow and no right arrow), ch<- means writing data to the channel, and <-ch means reading data from the channel.

select

Look at the select keyword again. The select here is actually an instruction of the IO multiplexing technology of the linux operating system. Its purpose is to poll each event result when receiving an asynchronous event .

Go implements the select function at the language level. Its function is similar to that of linux, which is to block the current goroutine and wait for the return of chan.

Usually select will be used with case and default, the usage is similar to the switch-case statement, whichever one is satisfied will be triggered. In the above code, when I use the select keyword to read the data in the channel, because the caN function has not returned at the beginning , the goroutine of main enters the default and sleeps for 100 milliseconds. Then loop the previous operations again until there is a statement such as return in a certain case.

If multiple cases are satisfied at the same time, Go Runtime will randomly select a case to execute . Usually, Goroutine's timeout is implemented by writing a timeout function, such as the following code:

func makeNum(ch chan<- int) {
    
    

time.Sleep(5 * time.Second)

ch <- 10

}

func timeout(ch chan<- int) {
    
    

time.Sleep(3  time.Second)

ch <- 0

}

func chanBlock() {
    
    

ch := make(chan int, 1)

timeoutCh := make(chan int, 1)

go makeNum(ch)

go timeout(timeoutCh)

select {
    
    

case <-ch:

fmt.Println(ch)

case <-timeoutCh:

fmt.Println("timeout")

	}

}

producer-consumer model

Well, in order to understand the programming style of Goroutine more proficiently, let us finally use Goroutine to implement a classic producer-consumer model in the synchronization mutual exclusion problem of the operating system:

// 需求:创建生产者消费者模型,其中生产者和消费者分别是N和M个

// 生产者每隔一段时间生产X产品,消费者同样也每隔一段时间消费Y产品

// 生产者如果将产品容器填满应该被阻塞,多次阻塞之后将会退出

// 每个消费者需要消费满Z个产品才能退出,否则就要一直消费产品

const (

ProducerCount = 3 // 生产者数量

ConsumerCount = 5 // 消费者数量

FullCount = 15 // 消费者需求数量,消费者吃够了应该回家

TimeFactor = 5 // 时间间隔因子,每生产/消费一个产品,需要休息一段时间

QuitTimes = 3 // 生产者退出次数,如果生产者阻塞了多次,则会下班

SleepFactor = 3 // 睡眠时间因子,如果生产者被阻塞应该睡眠一段时间

)

var (

waitGroup = sync.WaitGroup{
    
    }

)

func producer(n int, ch chan<- int) {
    
    

defer waitGroup.Done()

times := createFactor()

asleepTimes := 0

for true {
    
    

p := createFactor()

select {
    
    

case ch <- p:

{
    
    

t := time.Duration(times) \ time.Second

fmt.Printf("Producer: %d produced a %d, then will sleep %d s\\n", n, p, times)

time.Sleep(t)

}

default:

{
    
    

time.Sleep(time.Second \ SleepFactor)

asleepTimes++

fmt.Println("I need consumers!")

if asleepTimes == QuitTimes {
    
    

fmt.Printf("Producer %d will go home\\n", n)

return

}

}

}

}

}

func consumer(n int, ch chan int) {
    
    

waitGroup.Done()

s := make([]int, 0, FullCount)

times := createFactor()

for len(s) < FullCount {
    
    

select {
    
    

case c := <-ch:

{
    
    

s = append(s, c)

fmt.Printf("Consumer: %d consume a %d, remains %d, then will sleep %d s\\n", n, c, FullCount-len(s), times)

time.Sleep(time.Duration(times) \ time.Second)

}

default:

{
    
    

fmt.Println("Producers need to hurry up, I'm hungry!")

time.Sleep(time.Second \  SleepFactor)

}

}

}

fmt.Printf("Consumer: %d already full\\n", n)

}

func createFactor() int {
    
    

times := 0

for times == 0 {
    
    

times = rand.Intn(TimeFactor)

}

return times

}

func main() {
    
    

rand.Seed(time.Now().UnixNano())

ch := make(chan int, FullCount)

waitGroup.Add(ProducerCount)

for i := 0; i \< ProducerCount; i++ {
    
    

go producer(i, ch)

}

waitGroup.Add(ConsumerCount)

for i := 0; i \< ConsumerCount; i++ {
    
    

go consumer(i, ch)

}

waitGroup.Wait()

}

Another common package sync in Goroutine is used here. In the above example, a WaitGroup is used. Its purpose is similar to Task.WaitAll in C#, which is used to wait for all goroutines to finish executing. It can be seen that it is basically implemented based on semaphore, so every time a goroutine is created, the Add function needs to be executed.

Finally, I provide a Toy-Web that I implement by myself with every line of code . It took me a lot of effort to complete this hand-practicing small project. If I really want to implement it by myself, I need a little bit of algorithm skills. For example, routing should use Trie trees. Matching wildcards requires knowledge of BFS and DFS. If routing node expansion is required, You need an algorithm that backtracks. In short, the skyscrapers are rising from the ground, come on, all the new Gophers, I wish us all a bright future.

Extension link:

Implementing Excel server-side import and export under the Spring Boot framework

Project Combat: Online Quotation Procurement System (React +SpreadJS+Echarts)

Svelte framework combined with SpreadJS to realize pure front-end Excel online report design

Guess you like

Origin blog.csdn.net/powertoolsteam/article/details/132104023