条件跳转或移动取决于 for 循环中带有 strcat 的未初始化值

Question

我有一个包含 3 条染色体字符串的文件，我想将其连接成一个基因组。然后我必须跨多个线程访问这个连接的字符串（我使用 pthread_t）。为此，我必须在提取数据时使用 pthread_mutex_lock，然后我使用 strcat 连接使用 const char* 使用函数 fai_fetch 提取的数据，然后我将数据保存为 char *（见下文）。

// genome_size the size of all the chromosomes together
// chr_total the number of chromosomes I wish to concatenate
char* genome = (char*) malloc(sizeof(char) * (genome_size+chr_total));

for (int i = 0; i < chr_total; i++){
    pthread_mutex_lock(&data_mutex);
    const char *data = fai_fetch(seq_ref,chr_names[i],&chr_sizes[i]);
    pthread_mutex_unlock(&data_mutex);
    //sprintf(&genome[strlen(genome)],data);
    strcat(genome,data);  
    //sprintf(genome+strlen(genome),data); //All three gives conditional jump or move error

    //sprintf(genome,data); // THIS SOLVES VALGRIND ISSUE ONE BUT DOES NOT GIVE A CONCATENATED CHAR*
}

所有这些都有效，但是运行我得到了 valgrind

条件跳转或移动取决于引用“strcat(genome,data);”的未初始化值

并且未初始化的值是由堆分配创建的 “char* 基因组 = (char*) malloc(sizeof(char) * (genome_size+chr_total));”

根据 Whosebug 的其他答案，我尝试了 sprintf(&genome[strlen(genome)],data);和 sprintf（基因组+strlen（基因组），数据）；而不是 strcat。但是他们也给出了相同的 valgrind 消息。

唯一似乎可以减轻此错误的方法是使用 sprintf(genome,data);但是那样我就不会得到完整的基因组，而只会得到一条染色体。

正在尝试基因组 += sprintf(基因组，数据)；给我 ./a.out': munmap_chunk(): 无效指针: 和 ./a.out': free()

关于“未初始化的值是由堆分配创建的”错误 -> 然后我的问题是我只能在所有线程完成后释放该内存运行。所以我不确定在使用 malloc 时如何初始化堆分配中的值。

是否可以解决其中一些特定的 valgrind 错误？

Answer 1

使用Valgrind定位问题代码

“条件跳转或移动取决于未初始化的值”消息意味着 Valgrind 已确定程序的某些结果取决于未初始化的内存。使用 --track-origins=yes 标志来跟踪未初始化值的来源。它可能会帮助您找到该值。来自 man 1 valgrind:

When set to yes, Memcheck keeps track of the origins of all uninitialised values. Then, when an uninitialised value error is reported, Memcheck will try to show the origin of the value. An origin can be one of the following four places: a heap block, a stack allocation, a client request, or miscellaneous other sources (eg, a call to brk).

更具体的在你的程序中：

问题 1：使用未初始化的 `genome`

行

char* genome = (char*) malloc(sizeof(char) * (genome_size+chr_total));

使用 malloc(2) 分配 genome 缓冲区，然后在以下位置使用它：

strcat(genome,data);

请注意，strlen(3) 和 strcat(3) 等函数适用于 C 字符串，它们是以空字符 ('\0') 结尾的缓冲区。

malloc(2) 只是分配内存，它不会初始化它，所以你分配的缓冲区可能包含 any 值（和视为未初始化）。您应该避免将字符串相关函数与未初始化的缓冲区一起使用，因为它会导致未定义的行为。

幸运的是 calloc(2) 可以解决问题 - 它分配缓冲区并将其所有位初始化为零，从而生成一个有效的 0 长度 C 字符串，您可以对其进行操作。我建议进行以下修复以确保初始化 genome：

char* genome = calloc(genome_size+chr_total+1, sizeof(char));

另请注意，我已将 +1 添加到已分配缓冲区的长度。这样做是为了保证生成的 genome 将以空终止符结尾（假设 genome_size+chr_total 是从 fai_fetch 返回的所有缓冲区的总大小）。

另请注意，就性能而言，calloc 比 malloc 慢一点（因为它会初始化数据），但我认为它更安全，因为它会初始化整个缓冲区。对于您的特定程序，您可以通过使用 malloc 并仅初始化第一个字节来节省性能负担：

char* genome = malloc(sizeof(char) * (genome_size + chr_total + 1));
if (NULL == genome) {
    perror("malloc of genome failed");
    exit(1);
}
// So it will be a valid 0 length c-string
genome[0] = 0;

我们不必将最后一个字节初始化为 0，因为 strcat 为我们写入了终止空字符。

（潜在）问题 2：使用潜在的非空终止 `data` 和 `strcat`

正如您在问题中所述，fai_fetch 提取了一些数据：

const char *data = fai_fetch(seq_ref,chr_names[i],&chr_sizes[i]);

然后在 strcat 行中使用它：

strcat(genome,data);

正如我上面写的，因为你使用 strcat，data 也应该以 null 结尾。

我不确定 fai_fetch 是如何实现的，但如果它 returns 是一个有效的 C 字符串，那么你就没问题了。

如果不是，那么您应该使用 strncat，它适用于非空终止的缓冲区。

来自 man 3 strcat:

The strncat() function is similar, except that

it will use at most n bytes from src; and

src does not need to be null-terminated if it contains n or more bytes.

我建议进行以下修复：

// I'm not sure what type `&chr_sizes[i]` is, assuming it's `size_t`
size_t length = &chr_sizes[i];
const char *data = fai_fetch(seq_ref,chr_names[i], length);
// Use strcat
strncat(genome, used_data, length);

条件跳转或移动取决于 for 循环中带有 strcat 的未初始化值

Conditional jump or move depends on uninitialized value(s) with strcat in for loop

c++

c

malloc

valgrind

使用Valgrind定位问题代码

问题 1：使用未初始化的 `genome`

（潜在）问题 2：使用潜在的非空终止 `data` 和 `strcat`

条件跳转或移动取决于 for 循环中带有 strcat 的未初始化值

Conditional jump or move depends on uninitialized value(s) with strcat in for loop

c++

c

malloc

valgrind

使用Valgrind定位问题代码

问题 1：使用未初始化的 genome

（潜在）问题 2：使用潜在的非空终止 data 和 strcat

问题 1：使用未初始化的 `genome`

（潜在）问题 2：使用潜在的非空终止 `data` 和 `strcat`