ATLAS 多线程 BLAS 例程中的 valgrind "uninitialized value" 警告是否为误报?

Are valgrind "uninitialized value" warnings false positives in ATLAS multithreaded BLAS routines?

我正在将 ATLAS 用于 LAPACK 和多线程 BLAS 例程,并且注意到当我的矩阵变得足够大以供 ATLAS 使用多线程版本的 BLAS 时,我从 Valgrind 收到初始化错误。这是我的代码中的一个最小示例:

#include <stdio.h>
#include <stdlib.h>

extern void dgetrf_(int *, int *, double *, int *, int *, int *);
extern void dgetri_(int *, double *, int *, int *, double *, int *, int *);
extern void dgemm_(char *, char *, int *, int *, int *, double *, double *, int *, double *, int *, double *, double *, int *);

int main(void)
{
    double *m1,*m2,*work,*temp;
    int dim = 576;
    int i,j,info;
    int lwork = dim * dim;
    int *ipiv;
    char transA = 'N';
    char transB = 'N';
    double alpha = 1.0;
    double beta = 0.0;

    m1 = malloc(dim*dim*sizeof(double));
    m2 = malloc(dim*dim*sizeof(double));
    temp = malloc(dim*dim*sizeof(double));
    ipiv = malloc(dim*sizeof(int));
    work = malloc(lwork*sizeof(double));

    for(i=0; i<dim; i++)
     {
       for(j=0; j<dim; j++)
        {
          if(i==j)
           {
             m1[i+dim*j] = .25;
             m2[i+dim*j] = .5;
           }
          else
           {
             m1[i+dim*j] = 0.0;
             m2[i+dim*j] = 0.0;
           }
        }
    }

    dgetrf_(&dim, &dim, m1, &dim, ipiv, &info);
    dgetri_(&dim, m1, &dim, ipiv, work, &lwork, &info);

    dgemm_(&transA, &transB, &dim, &dim, &dim, &alpha, m1, &dim, m2, &dim, &beta, temp, &dim);
    for(i=0; i<dim*dim; i++)
        m1[i] = temp[i];

    dgetrf_(&dim, &dim, m1, &dim, ipiv, &info);
    dgetri_(&dim, m1, &dim, ipiv, work, &lwork, &info);

    free(m1);
    free(m2);
    free(ipiv);
    free(work);
    free(temp);

    return 0;
}

(注意:我已经检查以确保矩阵不是奇异的而且它们不是。)

我编译程序:

gcc -Wall -DATLAS -m64 -g -c fermi.c
gcc -o fermi fermi.o -L/usr/lib64/atlas/ -lm -ltatlas

和运行 valgrind:

valgrind --leak-check=yes ./fermi

当我这样做时,当遇到 dgetrf_ 和 dgetri_ 的第二个实例时,我从 "Conditional jump or move depends on uninitialised value(s)" 的 11 个上下文中得到 193 个错误。

==24999== Memcheck, a memory error detector
==24999== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==24999== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==24999== Command: ./fermi
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x524C62B: ??? (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51C29E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x524C66A: ??? (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51C29E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x524C6BE: ??? (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51C29E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x51C2A0B: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x51C2A0D: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x51C2A4E: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x51C2A61: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x524C2D7: ATL_daxpy (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x53426BB: ATL_dgerk_axpy (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51C2AC7: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x524C751: ??? (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51C29E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51CD2BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x5210416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400A97: main (fermi.c:52)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x51CD8E5: ATL_dtrtri (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51C2EC3: ATL_dgetriC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520EFA5: atl_f77wrap_dgetri_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F684: dgetri_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400AC0: main (fermi.c:53)
==24999== 
==24999== Conditional jump or move depends on uninitialised value(s)
==24999==    at 0x51CD8E7: ATL_dtrtri (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x51C2EC3: ATL_dgetriC (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520EFA5: atl_f77wrap_dgetri_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x520F684: dgetri_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==24999==    by 0x400AC0: main (fermi.c:53)
==24999== 
==24999== 
==24999== HEAP SUMMARY:
==24999==     in use at exit: 0 bytes in 0 blocks
==24999==   total heap usage: 2,024 allocs, 2,024 frees, 54,831,424 bytes allocated
==24999== 
==24999== All heap blocks were freed -- no leaks are possible
==24999== 
==24999== For counts of detected and suppressed errors, rerun with: -v
==24999== Use --track-origins=yes to see where uninitialised values come from
==24999== ERROR SUMMARY: 193 errors from 11 contexts (suppressed: 0 from 0)

我发现一些链接表明这 可能 是图书馆做事方式的误报,尽管它们与我的上下文关系不大。

memory leak in dgemm_

https://www.open-mpi.org/community/lists/users/2007/05/3192.php

所以我的问题是: valgrind 是否给我误报错误?

is valgrind giving me false positive errors?

好像没有

而不是 运行ning valgrind 与 --leak-check=yes 你应该 运行 它与 --track-origins=yes 来查看未初始化的值来自哪里,正如 valgrind 在末尾所建议的那样输出。这是我得到的 --track-origins=yes:

[ ~]$ valgrind --track-origins=yes ./a.out 
==17533== Memcheck, a memory error detector
==17533== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==17533== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==17533== Command: ./a.out
==17533== 
==17533== Conditional jump or move depends on uninitialised value(s)
==17533==    at 0x4F4362B: ??? (in /usr/lib64/atlas/libtatlas.so.3.10)
==17533==    by 0x4EB99E3: ATL_dgetf2 (in /usr/lib64/atlas/libtatlas.so.3.10)
==17533==    by 0x4EC42BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==17533==    by 0x4EC42BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==17533==    by 0x4EC42BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==17533==    by 0x4EC42BF: ATL_dtgetrfC (in /usr/lib64/atlas/libtatlas.so.3.10)
==17533==    by 0x4F06538: atl_f77wrap_dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==17533==    by 0x4F07416: dgetrf_ (in /usr/lib64/atlas/libtatlas.so.3.10)
==17533==    by 0x400A29: main (fermi.c:50)
==17533==  Uninitialised value was created by a heap allocation
==17533==    at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
==17533==    by 0x40080B: main (fermi.c:22)

所以未初始化值的来源是这行代码:

temp = malloc(dim*dim*sizeof(double));

然后用于初始化 m1 并传递给第 50 行的 dgetrf_()

我不熟悉 ATLAS 库,但我想您应该以某种方式初始化 temp 变量。例如零初始化 tempcalloc 解决所有这些 valgrind 错误:

temp = calloc(dim*dim,sizeof(double));