Operating system processes and threads

In a traditional operating system, each process has an address space and a program sequence. This is the traditional definition of process. However, it is often the case that multiple program sequences are running in the same address space, which are similar to separate processes.

The use of multi-threading

1. Multiple activities occur simultaneously in many applications, and these activities need to share a large amount of data but are relatively independent at the same time. For example, editing tools wait for input and can automatically save text. At this time, it is difficult or impossible to use multiple processes. 2. In another case, some of the activities will block over time. By decomposing these applications into multiple sequences running in parallel, after a sequence is blocked, it will not affect the running of other sequences, and the programming model will become Simple. 3. At the same time, with the development of multi-core architecture, a computer often has more than two CPUs. Multi-threading can make it possible for a program to utilize multiple CPUs at the same time.

Classic thread model

The process model is based on two independent concepts:Resource management and execution. Sometimes it is better to separate these two concepts, thus introducing the concept of threads. The process gathers the relevant resources of the program together. The process stores the text and data of the program and other resources. These resources include open files, child processes, upcoming timers, signal handlers, account information, etc. Another concept is that a process has a sequence of executions, usually called a thread. There is a program counter in the thread to record which instruction the program wants to execute. Threads have registers that hold their current working variables. Threads also have a stack to record execution history. Threads mainly contain information related to the execution sequence but not public. But threads and processes are different concepts. In modern systems, processes play the role of resource management and are a resource container used to gather resources together. A thread is the execution sequence of a program, an object that actually runs in the CPU, and an entity that is scheduled on the CPU.

Threads add something to the process, that is, allowing multiple threads to execute with greater independence from each other in the same process environment. Multiple threads share an address space and other resources. Process is actually a similar concept. Multiple processes share CPU, physical memory, disk and other resources. When multiple threads run in a single CPU, the threads run in turns. In process multiprogramming, the CPU switches back and forth between multiple processes, and the system creates the illusion that different processes are running in parallel. Multiple threads work similarly. The CPU switches back and forth between threads, creating the illusion that the threads are running in parallel.

Threads in a process are not as independent as between processes. All threads have exactly the same address space, which means they also share the same global variables. Since individual threads have access to every memory address in the address space, one thread can access and even modify another thread's stack. There is no protection between threads. There are two reasons: 1. Impossible 2. Unnecessary. Threads are different from processes. Different processes may come from different users, and there may be hostility and competition between them. Threads are often created by a user, and the purpose of creation is to cooperate with each other rather than compete with each other. Close collaboration is often required.

 

process thread

address space

global variables

child process

timer

Signal

account information

program counter

register

stack

state

 

The difference from the process concept in the previous chapter is that information related to program execution is saved in the thread block.

In the case of multi-threading, the process will usually start from the current single thread. This thread creates a new thread by calling a library function. When creating, there is no need to specify the address space of the thread, because the new thread will automatically run in the address space of the creating thread. Sometimes threads are hierarchical and they have a parent-child relationship, but usually this relationship does not exist and all threads are equal.

When a thread completes its work, it can exit through a library procedure thread_exit. The thread then disappears and becomes unschedulable. On some systems, a procedure can be called, such as thread_join, and a thread can wait for another specific thread to exit. This procedure blocks the calling thread until the specific thread exits. Another common thread call is thread_yield, which allows a thread to actively give up the CPU and let another thread run. This feature is important for the threading model, because unlike processes, threads have a cooperative relationship, and the thread library cannot use clock interrupts to force threads to give up the CPU. So there needs to be some way for the threads to cooperate with each other and automatically hand over the CPU over time to give other threads a chance to run.

Threads implemented in user space

There are two main ways to implement the thread package: in user space and in kernel space. Both methods have advantages and disadvantages for each other, and of course a mixed method is also possible.

The first way is to put the entire thread package in user space, and the operating system knows nothing about threads. From the perspective of the kernel, the operating system schedules processes in a normal way. The advantage of this method is that it can be implemented on operating systems that do not support threads.

Threads are managed in user space. Each process has its own thread table to track the threads in the process. These tables are similar to the kernel's process table, but only record thread-related attributes, such as the program counter of each thread. , stack pointer, register, status. The thread table is managed by the runtime system. When a thread transitions to the ready state or blocked state, the information required to restart the thread is saved in the thread table.

When a thread does something that causes local blocking, such as waiting for another thread in the process to complete its work, it calls a runtime system procedure. A procedure checks whether the process must enter the blocking state. If so, it is in the thread. Save the thread's register in the table, check the runnable ready threads in the table, and reload the saved value of the new thread into the CPU's register. As soon as the stack needle, program counter, and registers are switched, the new thread is automatically put into operation. Doing thread switching like this is at least an order of magnitude faster than trapping in the kernel, which is a huge advantage of using user packages. User threads have another advantage, allowing each process to have its own customized scheduling algorithm.

Although the user thread package has better performance, it also has some obvious problems.

One question is how to implement blocking system calls. Suppose a thread reads the keyboard before any keys are pressed. It is not acceptable to have this thread actually make a system call, as this will stop all threads. A major goal of using threads is to first allow each thread to use blocking calls, but also to prevent blocked threads from affecting other threads. One way is to change all system calls to non-blocking calls, but this requires modifying the operating system. One advantage of user-level threads is that they can run on existing operating systems. In addition, modifying the semantics of system calls requires modifying many user programs. Another alternative is to notify in advance if a call will block. In Unix-like systems, another system call select allows the caller to notify whether an expected read will block. If the system has this call, then the library procedure read can be replaced by a new operation. First, select is called, and then read is called only when it is safe. If the read call blocks, the relevant call will not be made and will be replaced by Run another thread. The next time the relevant operating system takes control, it can check again whether it is safe to call read now. This processing method requires rewriting part of the system call library, which is not efficient and elegant. There seems to be no other alternative. This type of code that performs checks on system calls is called a wrapper.

Another problem is the page fault interrupt problem. If a page fault occurs while a thread is running, the operating system will read the page that needs to be accessed from the disk to the main memory, and the related thread will be blocked, but the system does not know the existence of the thread. Usually The entire process will be blocked until disk IO is completed, although other threads are runnable.

Another problem with user-level threads is that if one thread starts running, no other threads in the process can run unless the first thread actively gives up the CPU. Within a single process, there is no clock interrupt, so there is no way to schedule threads in round robin. Unless a thread can enter the runtime system of its own volition, the scheduler has no chance of running. A possible solution is to request a clock interrupt every second, but this is blunt and disorderly for the program, and the overhead is objective. Threads may also require clock interrupts, which disrupts the clock used by the runtime system.

The biggest negative argument for user-level threads is that programmers generally want to use multithreading only when blocking occurs frequently. For example, in a multi-threaded web server, these threads continue to make system calls. Once a kernel trap occurs and a system call is made, if the original thread has been blocked, it will be difficult for the kernel to switch threads. If you want the kernel to eliminate this situation , it is necessary to continuously make select system calls, so for applications that are basically CPU-intensive and rarely block, using multi-threading is meaningless.

Threads implemented in kernel space

Now consider the case of supporting threads in the kernel. At this point, a runtime system for user-level threads is no longer needed, and there are no thread tables in each process. Instead, there is a thread table in the kernel that records all threads. When a thread wants to create a thread or cancel an existing thread, it makes a system call, which completes the creation or cancellation of the thread table.

The kernel thread table saves the registers, status, program counter, stack pointer and other information of each thread. This information is the same as for userspace threads, but now stored in the kernel.

All calls that can block threads are implemented in the form of system calls. Compared with runtime system procedures, the implementation cost is relatively small. When a thread is blocked, the kernel can schedule a ready thread in the thread table to continue running. This thread may or may not belong to the same process as the blocked thread. In user-level threads, the runtime system always runs the thread in its own process until the kernel deprives it of the CPU.

Kernel threads do not require any new, non-blocking system calls. In addition, if a thread of a process causes a page fault, the kernel can easily check whether the process has other runnable threads. If so, it can select a runnable thread while waiting for the required page to be read from disk. run. The main disadvantage of this is that the cost of system calls is relatively high. If there are many thread operations, it will bring a lot of overhead.

hybrid implementation

Various approaches have been studied that attempt to combine the advantages of user-level threads with those of kernel-level threads. One way is to use kernel-level threads and then multiplex user-level threads and kernel threads. With this approach, the kernel only identifies and schedules kernel-level threads. Some of these kernel-level threads will be reused by multiple user-level threads. These user-level threads can be created, destroyed, and scheduled just like user-level threads in an operating system without multi-threading capabilities. Golang's thread is a typical example. In the picture below, I created 10,000 threads.

package main

import (
	"time"
	"fmt"
)

func main() {
	for i:=0;i<10000;i++{
		go func() {
			time.Sleep(time.Minute)
		}()
	}
	fmt.Println("gogogo")
	time.Sleep(time.Minute)

}

But in the mac system, the kernel actually creates only 10 threads. We will introduce the thread implementation of Go language specifically in future blog posts.

POSIX thread

In order to implement portable threaded programs, IEEE defines standards for threads. The thread package it defines is called pthread. Most unix or unix-like systems support this standard.

Thread call describe
pthread_create Create thread
pthread_exit Exit thread
pthread_join Wait for a specific thread to exit
pthread_yield Free the CPU to run another thread
pthread_attr_init Create and initialize a thread's attribute structure
pthread_attr_destory Delete a thread's attribute structure

 

The following is an example of using C language on a mac system. The code can be run on a Linux system without modification.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void thread_test(void * i){
    (*((int*)i))++;
}

int main(int args, char **argv) {
    pthread_t pthread;
    int a=0;
    int status;
    status=pthread_create(&pthread,NULL,thread_test,(void *)&a);
    if(status!=0){
        printf("error");
        exit(-1);
    }
    pthread_join(pthread,NULL);
    printf("a=%d\n",a);
    exit(NULL);
}

The main function first declares the pthread_t structure, pthread_create creates the thread_test thread, the thread modifies a variable in the main function stack, waits for the thread thread_test to complete, and prints the variable a.

a=1

Guess you like

Origin blog.csdn.net/u013259665/article/details/86001352