为什么 Helgrind 显示 "lock order violated" 错误消息?

Why does Helgrind show "lock order violated" error message?

请看下面代码

    #include <stdio.h>
    #include <pthread.h>
    #include <assert.h>
    #include <stdlib.h>

    pthread_mutex_t g = PTHREAD_MUTEX_INITIALIZER;
    pthread_mutex_t m1 = PTHREAD_MUTEX_INITIALIZER;
    pthread_mutex_t m2 = PTHREAD_MUTEX_INITIALIZER;

    void* worker(void* arg) 
    {
        pthread_mutex_lock(&g);

        if ((long long) arg == 0) {
        pthread_mutex_lock(&m1);
        pthread_mutex_lock(&m2);
        } else {
        pthread_mutex_lock(&m2);
        pthread_mutex_lock(&m1);
        }
        pthread_mutex_unlock(&m1);
        pthread_mutex_unlock(&m2);

        pthread_mutex_unlock(&g);
        return NULL;
    }

    int main(int argc, char *argv[]) {
        pthread_t p1, p2;
        pthread_create(&p1, NULL, worker, (void *) (long long) 0);
        pthread_create(&p2, NULL, worker, (void *) (long long) 1);
        pthread_join(p1, NULL);
        pthread_join(p2, NULL);
        return 0;
    }

Helgrind 抛出以下错误:

==10035== Helgrind, a thread error detector
==10035== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==10035== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==10035== Command: ./Hw5
==10035== 
==10035== ---Thread-Announcement------------------------------------------
==10035== 
==10035== Thread #3 was created
==10035==    at 0x538987E: clone (clone.S:71)
==10035==    by 0x5050EC4: create_thread (createthread.c:100)
==10035==    by 0x5050EC4: pthread_create@@GLIBC_2.2.5 (pthread_create.c:797)
==10035==    by 0x4C36A27: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x1088BD: main (Hw5.c:28)
==10035== 
==10035== ----------------------------------------------------------------
==10035== 
==10035== Thread #3: lock order "0x309080 before 0x3090C0" violated
==10035== 
==10035== Observed (incorrect) order is: acquisition of lock at 0x3090C0
==10035==    at 0x4C3403C: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x10882E: worker (Hw5.c:16)
==10035==    by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x50506DA: start_thread (pthread_create.c:463)
==10035==    by 0x538988E: clone (clone.S:95)
==10035== 
==10035==  followed by a later acquisition of lock at 0x309080
==10035==    at 0x4C3403C: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x10883A: worker (Hw5.c:17)
==10035==    by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x50506DA: start_thread (pthread_create.c:463)
==10035==    by 0x538988E: clone (clone.S:95)
==10035== 
==10035== Required order was established by acquisition of lock at 0x309080
==10035==    at 0x4C3403C: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x108814: worker (Hw5.c:13)
==10035==    by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x50506DA: start_thread (pthread_create.c:463)
==10035==    by 0x538988E: clone (clone.S:95)
==10035== 
==10035==  followed by a later acquisition of lock at 0x3090C0
==10035==    at 0x4C3403C: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x108820: worker (Hw5.c:14)
==10035==    by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x50506DA: start_thread (pthread_create.c:463)
==10035==    by 0x538988E: clone (clone.S:95)
==10035== 
==10035==  Lock at 0x309080 was first observed
==10035==    at 0x4C3403C: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x108814: worker (Hw5.c:13)
==10035==    by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x50506DA: start_thread (pthread_create.c:463)
==10035==    by 0x538988E: clone (clone.S:95)
==10035==  Address 0x309080 is 0 bytes inside data symbol "m1"
==10035== 
==10035==  Lock at 0x3090C0 was first observed
==10035==    at 0x4C3403C: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x108820: worker (Hw5.c:14)
==10035==    by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10035==    by 0x50506DA: start_thread (pthread_create.c:463)
==10035==    by 0x538988E: clone (clone.S:95)
==10035==  Address 0x3090c0 is 0 bytes inside data symbol "m2"
==10035== 
==10035== 
==10035== 
==10035== For counts of detected and suppressed errors, rerun with: -v
==10035== Use --history-level=approx or =none to gain increased speed, at
==10035== the cost of reduced accuracy of conflicting-access information
==10035== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 7 from 7)

我认为外锁g不会让两个线程同时进入临界区。 在给定时间只有一个线程可以获得锁g。所以我认为没有出现僵局的可能。我穿了吗?为什么 helgrind 抛出这个错误?请解释。

Helgrind 抱怨您的线程被观察到以不同的相对顺序锁定互斥量 m1m2,这从代码检查中也很清楚。 Helgrind 查找并标记获取顺序中的此类差异,因为一般来说,它们会产生死锁风险。

I think the outer lock g will not allow two threads to enter the critical section at same time. Only one thread can acquire the lock g at a given time. So I think there is no possibility for a deadlock. Am I worng?

你没有看错。所提供的特定程序不会死锁,因为每个线程都必须先获取 g,然后才能获取其他互斥量中的任何一个。

Why helgrind is throwing this error?

因为 helgrind 是 启发式 分析 运行 你的程序在 期间的时间行为 运行。它不假定程序的单个 运行 演示了所有可能的行为。而您的评价是基于源代码分析。

您在这里看到的启发式规则是任何线程都不应以不同的相对顺序获取互斥体对。对于您的特定程序,这会产生误报,但您的程序似乎是专门为产生这种情况而设计的。如果互斥量 g 在获得其他任何一个时始终保持,则首先不需要互斥量 m1m2 。但是,如果任何其他线程有可能在不持有 g 的情况下获取 m1m2,那么死锁风险将是真实的,无论所述其他线程中的获取顺序如何。

无论如何,该警告表示您的代码存在真正的问题:您正在执行不需要的互斥操作,或者您现在或将来确实存在死锁风险。