PTHREAD_CANCELED 和线程的启动函数 return 值有什么问题

What is the problem for PTHREAD_CANCELED and thread’s start function return value

我正在阅读 Kerrisk's book 并看到以下内容作为注释,

Caution is required when using a cast integer as the return value of a thread’s start function. The reason for this is that PTHREAD_CANCELED, the value returned when a thread is canceled (see Chapter 32), is usually some implementation-defined integer value cast to void *. If a thread’s start function returns the same integer value, then, to another thread that is doing a pthread_join(), it will wrongly appear that the thread was canceled. In an application that employs thread cancellation and chooses to return cast integer values from a thread’s start functions, we must ensure that a normally terminating thread does not return an integer whose value matches PTHREAD_CANCELED on that Pthreads implementation. A portable application would need to ensure that normally terminating threads don’t return integer values that match PTHREAD_CANCELED on any of the implementations on which the application is to run.

我不明白笔记的重要性。你能把它编码(显示它的简单代码片段)只是为了说明吗?这些案例中的问题是什么?

这是 PTHREAD_CANCELED 的典型定义(在我输入此内容的机器上逐字引用 /usr/include/pthread.h,它使用 GNU libc 运行 Linux):

#define PTHREAD_CANCELED ((void *) -1)

因此,如果您有这样的代码来检查取消:

void *thread_result;
int rv = pthread_join(child, &thread_result);
if (rv)
    error_exit("pthread_join failed", rv);
if (thread_result == PTHREAD_CANCELED)
    error_exit("thread canceled", 0);

你不能也有这样的线程过程:

static void *appears_to_be_canceled(void *unused)
{
    return ((void *) -1);
}

因为 PTHREAD_CANCELED((void *) -1) 相等。请注意,该数字不能保证为 −1,它可能因系统而异,并且没有好的方法可以在编译时找出它是什么,因为 ((void *)...) 在 [=19= 中不可用] 表达式。

有两个好方法可以避免这个问题:

  • 不使用线程取消,因此您不必检查 PTHREAD_CANCELED 也不必关心它的数值是多少。由于其他几个原因,这是一个好主意,最重要的是取消使得编写健壮的多线程代码比现在更难。
  • Return 只有线程过程中的有效指针,而不是数字。一个好的习语是这样的:

    struct worker_data
    {
       // put _everything_ your thread needs to access in here
    };
    static void *worker_proc (void *data_)
    {
       struct worker_data *data = data_;
       // do stuff with `data` here 
       return data_;
    }
    

    Returning worker_data 对象意味着调用 pthread_join 的代码不必跟踪哪个 worker_data 对象对应于哪个 pthread_t。这也意味着成功完成的线程的 return 值保证不等于 PTHREAD_CANCELED,因为 PTHREAD_CANCELED 保证不等于任何有效指针。