[Linux] Thread control

In the concept of thread in the previous blog , I talked about many advantages of thread, but lack of reasonable access control, so let's take a look at how to control thread~

POSIX thread library

We know that there are no real threads in Linux. What the CPU sees is only the process entity, so the thread becomes nothingness, which means that Linux does not provide a set of system calls to manage threads. Since the operating system does not provide us with a structure, we implement a set of interfaces to manage threads.

POSIX stands for a portable operating system interface. The functions we introduce here also follow the POSIX standard. These functions are a set of user-level calls and constitute a complete series with related functions. Most of the names start with "pthread_" Yes , and to use the function library, we have to introduce the header file <pthread.h>, and the most important point here is that we need to add the "-pthread" option when compiling when linking these thread function libraries, and -l means link A third-party library.
Insert picture description here

1. Thread creation

Insert picture description here
Parameter 1: Return the id of the thread, an output parameter.
Parameter 2: Set the thread's attributes, using nullptr to represent the default attribute.
Parameter 3: Let the child thread execute the function's address, the name of the function.
Parameter 4: The child thread executes the function The parameters of the
return value: success returns 0, failure returns an error code.
Let's create a thread to try:

#include <iostream>
#include <pthread.h>
#include <unistd.h>

using namespace std;

void *handle(void *arg)
{
	while(1)
	{
		cout<<"i am new thread"<<endl;
		sleep(1);
	}
}
int main()
{
	pthread_t tid;
	pthread_create(&tid,NULL,handle,(void*)"thread 1");
	while(1)
	{
		cout<<"i am main thread"<<endl;
		sleep(2);
	}
	return 0;
}

Insert picture description here
Check out the thread we created:

ps -ajL
-L option can display thread related information

Insert picture description here

From the above figure, we see that there are two processes with the same PID of 8974, which means that these two processes are not real processes. The LWP (light weighted process) of the two are different, indicating that the two are different lightweight processes or threads. In fact, the unit of CPU scheduling a task is lwp , and the scheduling unit of the process we mentioned earlier is PID for single-threaded processes.
Each user mode thread corresponds to a scheduling entity in the kernel. Therefore, the OS needs to describe and then organize the threads to manage the threads, so the threads also have their own process descriptors (task_struct structure), thread local storage and thread stacks , And the starting address of the structure is to find a certain address describing the thread structure, so an integer variable tid used to uniquely identify threads is the first address of these thread descriptions, and LWP is the ID used for scheduling.
note: Thread and process are different, process has the concept of parent process. But in a thread group, all threads are peers. That is, the threads in the same process are called a thread group, they have no hierarchical relationship, only the main thread and the new thread.

Thread termination

To terminate the thread without terminating the process, there are three methods:

Return from the thread function. For the main thread return, it is equivalent to calling exit to the main function.
The thread can call pthread_exit to terminate itself.

A thread can call the pthread_cancel function to terminate other threads in the thread group. Generally used in scenarios where the main thread cancels other new threads.

One detail here is that when you cancel a child thread in the main thread, you need to ensure that the child thread is cancelled after the main thread is scheduled, otherwise you will not get the correct result. If the child thread is cancelled correctly, the pointer in the join function will point to -1.

#include <iostream>
#include <pthread.h>
#include <unistd.h>
using namespace std;
void* handle(void *arg)
{
	int i=3;
	while(i--)
	{
		cout<<"i am new"<<endl;
	}
	pthread_exit((void*)2);
}
int main()
{
	pthread_t tid;
	pthread_create(&tid,NULL,handle,(void*)"thread 1");
	void* status;
	pthread_join(tid,&status);
	cout<<(int*)status<<endl;
	return 0;
}

Insert picture description here

Thread waiting

The reason why the main thread is waiting:

Prevent memory leaks, the
main thread obtains the exit status of the new thread, and
ensure the synchronization of the thread exit sequence

Wait function

The first parameter is the ID of the waiting thread, and the second parameter points to a pointer, which points to the return value of the thread. The
function call returns 0 if it succeeds, and returns an error code if it fails.

The thread that calls this function will be suspended and waited until the thread whose ID is thread terminates. Therefore, when the thread termination method is different, the termination state obtained by join is different.

Here is a small detail of thread exit. The second parameter of the join function is to get the exit code of the thread exit. This is just the exit code, unlike when our process is waiting, we can determine whether it is terminated by a signal, because the thread is A branch of the process, as long as the thread is abnormal, the process will exit, so it is impossible to get the exception code, only the exit code
1. If the thread thread returns through return, the unit pointed to by retval stores the return value of the function.
2. If the thread thread terminates abnormally because other threads call the pthread_cancel function, the constant PTHREADCANLELED is stored in the unit pointed to by retval.
3. If the thread thread calls the pthread_exit function to terminate, the unit pointed to by retval stores the parameters passed to the exit function.
4. If you are not interested in the termination status of the thread, just pass NUL directly to the second parameter.

Test code

#include <iostream>
#include <pthread.h>
#include <unistd.h>

using namespace std;
void* handle(void* arg)
{
	int i=3;
	while(i--)
		cout<<"i am new"<<endl;
	return (void*)1;
}

int main()
{
	pthread_t tid;
	pthread_create(&tid,NULL,handle,(void*)"thread 1");
	void* status;
	pthread_join(tid,&status);
	cout<<(int*)status<<endl;
	return 0;
}

Insert picture description here

Thread ID

How to get the thread ID

Before threads, a process corresponds to a process descriptor in the kernel and corresponds to a process ID. But after the introduction of the thread concept, the situation has changed. A user process governs N user mode threads. As an independent scheduling entity, each thread has its own process descriptor in the kernel mode, and the descriptors of the process and the kernel are all at once. It becomes a 1:N relationship . The POSIX standard requires all threads in the process to return the same process ID when calling the getpid function. How to solve the above problem?
In the Linux kernel, a multi-threaded process is called a thread group, and each thread in the thread group has a process descriptor corresponding to it in the kernel . The pid in the process descriptor structure, on the surface, corresponds to the process ID, but in fact it is not, it corresponds to the ID of the first thread, the main thread; the tgid in the process description means Thread Group ID, which corresponds to the value Is the user-level process ID.
So in fact, the process pid you see corresponds to the tgid in test_struct, and the lwp we see corresponds to the pid in task_struct. Then we look at the structure of task_struct.

 struct task_struct {
...
pid_t pid;
pid_t tgid;
...
struct task_struct *group_leader;//主线程
...
struct list_head thread_group;//用来描述一个线程组的链表
...
};

The next time you call gitpid in user mode, you need to know that the system actually returned you the tgid in test_struct. Gittid is also provided in Linux to return the thread id, but this system call is not encapsulated and is not very convenient to use.

Thread separation

By default, the newly created thread is joinable. After the thread exits, you need to perform pthread_join operation on it. Otherwise, the resource cannot be released, which will cause a memory leak. If you don’t care about the return value of the thread, join is a burden, so We can choose to separate the child threads and automatically release the thread resources. The threads can separate themselves or let others help them separate.
Insert picture description here
note:A thread cannot wait after being detached

#include <iostream>
#include <pthread.h>
#include <unistd.h>

using namespace std;
void* handle(void* arg)
{
	pthread_detach(pthread_self());
	int i=3;
	while(i--)
		cout<<"i am new"<<endl;
	pthread_cancel(pthread_self());
	return (void*)1;
}

int main()
{
	pthread_t tid;
	pthread_create(&tid,NULL,handle,(void*)"thread 1");
	void* status;
	int ret=pthread_join(tid,&status);
	cout<<(int*)status<<":"<<ret<<endl;
	return 0;
}

Insert picture description here

to sum up:

The thread control functions used under Linux are actually a group of user-level functions. As long as we can master the use of these functions, it will be very simple to control threads.