Use deadlock introduced pthread_cancel

First is the basic concepts pthread_cancel.

pthread_cancel call was not forced to terminate a thread, it is only a request.
How to deal with the thread cancel signal by the target thread to decide their own, it can be ignored, can be terminated immediately, or continue to run Cancelation-point (cancellation points), determined by different Cancelation state.

There are several related functions pthread_cancel should also mention this:

int pthread_setcancelstate(int state, int *oldstate)

Setting this thread processing to cancel the signal, state there are two values: PTHREAD_CANCEL_ENABLE (default) and PTHREAD_CANCEL_DISABLE, respectively, after receiving the signal to CANCLED state and ignore CANCEL signal continues to run.

int pthread_setcanceltype(int type, int *oldtype)

Setting this thread cancellation action execution timing, type, there are two values: PTHREAD_CANCEL_DEFFERED and PTHREAD_CANCEL_ASYCHRONOUS, effective only when the cancel status is ENABLE, respectively, continue to run after receiving the signal at a cancellation point and then quit and cancel the implementation of immediate action ( drop out).

void pthread_testcancel(void)

When the local thread cancellation point is not included, but they need to cancel points need to use this function to create a cancellation point, so that does not include a cancellation point in the code execution thread in response to a cancellation request.

Of course, the thread cancellation point is not only to set this function is called, the system also has some function call cancellation point features such as: pthread_cond_wait, sigwait (2), and so on. Specific you can query the network.

 

After understanding the mechanism of cancellation of pthread_cancel, bug analysis into the link.

Source as follows:

 1 #include <pthread.h>
 2 #include "stdio.h"
 3 #include "stdlib.h"
 4 #include "unistd.h"
 5 
 6 pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
 7 pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
 8 
 9 void* testThreadOne(void* arg)
10 {
11   pthread_mutex_lock(&mutex);
12   puts("ThreadOne label 1.");
13   pthread_cond_wait(&cond, &mutex);
14   puts("ThreadOne label 2.");
15   pthread_mutex_unlock(&mutex);
16   puts("ThreadOne label 3.");
17   pthread_exit(NULL);
18 }
19 
20 void* testThreadTwo(void* arg)
21 {
22   sleep(2);
23   puts("ThreadTwo label 1.");
24   pthread_mutex_lock(&mutex);
25   puts("ThreadTwo label 2.");
26   pthread_cond_broadcast(&cond);
27   pthread_mutex_unlock(&mutex);
28   puts("ThreadTwo label 3.");
29   pthread_exit(NULL);
30 }
31 
32 int main()
33 {
34   pthread_t tid[2] = {0};
35 
36   pthread_create(&tid[0], NULL, testThreadOne, NULL);
37   pthread_create(&tid[1], NULL, testThreadTwo, NULL);
38 
39   sleep(1);
40   puts("Main thread label 1.");
41   pthread_cancel(tid[0]);
42 
43   pthread_join(tid[0], NULL);
44   pthread_join(tid[1], NULL);
45   pthread_mutex_destroy(&mutex);
46   pthread_cond_destroy(&cond);
47 
48   return 0;
49 }

First to compile and run this program:

ThreadOne label 1.
Main thread label 1.
ThreadTwo label 1.

结果不尽人意,程序并没有退出,产生了死锁的问题。

结合打印我们可以分析出程序停在了线程二 pthread_mutex_lock(&mutex) 的位置。

 

我们可以大致的梳理下整个程序的运行流程:

两个线程创建后,主线程会睡眠1秒,由于线程二开始也是要睡眠,所以此时线程一取得了运行权,
它会先将 mutex 上锁,并输出 label 1 信息,wait 函数内部会先将 mutex 解锁,然后等待 cond 条件,暂时没有其它线程唤醒,所以线程一会阻塞在此处。

由于主线程的睡眠时间较短,所以会优先被唤醒继续执行,输出 main label 1,随后调用 pthread_cancel 函数向线程一发出退出请求,并阻塞在 join 处。
此时线程一的 cancel 请求处理处于“受理”的状态,并且恰巧处于请求点(wait 调用),所以线程一会正常的退出。

流程继续,线程二的睡眠时间到并取得了运行权,先是输出 label 1 信息,然后请求 lock mutex,问题来了,线程二会在此阻塞下去。主线程也阻塞在 join 处无法退出。原因是为什么呢?

仔细一想我们就可以得出答案,通过之前的知识储备,wait 在调用时其内部会先将 mutex 解锁,如果被条件唤醒的话,它的内部会再次将 mutex 上锁来占据资源。

其实我们通过查看 GLIBC 的源码就可以来证明一切,我在这里贴出 2.30 版本的部分源码:

 1 static __always_inline int
 2 __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 3   clockid_t clockid,
 4   const struct timespec *abstime)
 5 {
 6     ...
 7   err = __pthread_mutex_unlock_usercnt (mutex, 0);
 8     ...
 9   futex_wait_cancelable(cond->__data.__g_signals + g, 0, private)
10     --> oldtype = __pthread_enable_asynccancel ();
11     --> int err = lll_futex_timed_wait (futex_word, expected, NULL, private);
12     --> __pthread_disable_asynccancel (oldtype);
13     ...
14   err = __pthread_mutex_cond_lock (mutex);
15     ...
16 }

通过源码可知,wait 函数的入口和出口处分别会对 mutex 进行加锁和解锁的操作,而在 __pthread_enable_asynccancel () 与 __pthread_disable_asynccancel (oldtype) 之间的时段里就对应着我们前面提到过的取消点,只有程序执行在两个函数之前时才可以被 cancel(默认状态下) 函数所取消。而我们使用 cancel 请求处于取消点的 wait 函数退出时,线程不是直接退出,而是将 wait 函数执行完成,所以BUG就这样引入了,mutex 并没有得到释放,可我们一定要这样的使用 cancel 函数的话,就没有解决锁释放的方法了么?

答案是有的,官方早已想到了这点,为我们精心准备了 pthread_cleanup_push 函数,它的作用就是在一些情况下退出线程做出一些收尾的动作,如使用 phread_exit、pthread_cancel 函数退出线程,在网上有说过线程异常退出的也可以调用 clean 函数,可笔者尝试过内存越界访问情况的异常,clean 函数却没有被调用,可能是指的不是这种情况吧。

利用 clean 函数,我们可以对前面的源程序中的线程一做一些修改,如下所示:

 1 void cleanup(void *arg)
 2 {
 3     pthread_mutex_unlock(&mutex);
 4 }
 5 
 6 void *testThreadOne(void *arg)
 7 {
 8     pthread_cleanup_push(cleanup, NULL);
 9     pthread_mutex_lock(&mutex);
10     puts("ThreadOne label 1.");
11     pthread_cond_wait(&cond, &mutex);
12     puts("ThreadOne label 2.");
13     pthread_mutex_unlock(&mutex);
14     puts("ThreadOne label 3.");
15     pthread_cleanup_pop(0);
16     pthread_exit(NULL);
17 }

再次编译执行程序:

ThreadOne label 1.
Main thread label 1.
ThreadTwo label 1.
ThreadTwo label 2.
ThreadTwo label 3.

Perfect! mutex 得到了正确的释放,程序正常执行完毕。

 

本文参考自:https://www.cnblogs.com/mydomain/archive/2011/08/15/2139830.html

Guess you like

Origin www.cnblogs.com/GyForever1004/p/11455479.html