获取堆缓冲区溢出不确定是否在 C 中正确地为二维数组分配内存

Question

我在将命令行输入字符串中的字符串复制到二维数组中的字符串时遇到问题。我的程序必须将由字母组成的字符串与任何非字母字符分开。例如 hello23ght.!good 需要放入一个二维数组中：

你好
正确
很好

我已经找到了最长的字符串和字符串的数量，这样我就可以按如下方式为我的二维数组分配内存。

char **stringArr; //array to hold seperated strings

stringArr = (char **)malloc(numOfStrings * sizeof(char*)); //malloc rows of 2d array

if(stringArr == NULL) { //checks to see if memory was allocated correctly
    return 1;
}


int y; 
for (y = 0; y < numOfStrings; y++) { //malloc columns of array
    stringArr[y] = (char*) malloc((longestString + 1) * sizeof(char));

    if(stringArr[y] == NULL) { //checks to see if memory was allocted correctly
        return 1;
    }
}

后记我写了这段代码来查找输入字符串中的单个字母字符串，并将每个字母字符串放入二维数组的一个 "slot" 中：

while (argv[1][a] != '[=11=]') { // Keep traversing the argument until the null char is reached
    if (isAlpha(argv[1][a]) == 1) { // if the first char in argv[1] is a letter, copy it into the first row and first column of stringArr
        stringArr[b][c] = argv[1][a];
        printf("%c" , stringArr[b][c]); //test
        a++; 
        c++;
        //printf("%d %d \n", a, c);
    } else if (a > 0 && isAlpha(argv[1][a]) != 1 && isAlpha(argv[1][a-1]) == 0) { //If the previous character is a letter and the current character isn't a letter increment a and b. (We have hit the end of the first unique string)
        a++;
        stringArr[b][c+1] = '[=11=]'; //Setting the null byte for the unique string
        b++; //incrementing b to the next unique string
        printf("%d %d %d \n", a, b, c);
        c = 0; // resetting c for the next unique string 
    } else {// if neither of the first two  statments occur only increment var a since we have hit a repeating separating character.
        a++;
    }
}

但是，当我运行代码时，出现以下错误：

==46957==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000ef37 at pc 0x000103a76b8c bp 0x7fff5c18a910 sp 0x7fff5c18a908
WRITE of size 1 at 0x60200000ef37 thread T0

    SUMMARY: AddressSanitizer: heap-buffer-overflow ??:0 main
    Shadow bytes around the buggy address:
      0x1c0400001d90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x1c0400001da0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x1c0400001db0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x1c0400001dc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x1c0400001dd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa 07 fa
    =>0x1c0400001de0: fa fa 07 fa fa fa[07]fa fa fa 00 06 fa fa 00 00
      0x1c0400001df0: fa fa 00 04 fa fa 00 06 fa fa fd fd fa fa fd fd
      0x1c0400001e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x1c0400001e10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x1c0400001e20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x1c0400001e30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07 
      Heap left redzone:       fa
      Heap right redzone:      fb
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack partial redzone:   f4
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==46957==ABORTING

hlowrdjAbort trap: 6

我不确定到底出了什么问题，但我假设是我没有为二维数组分配足够的内存，或者我的 while 语句没有准确地从输入字符串中复制唯一的字母字符串。

编辑我忘了添加这个但是a，b和c确实被初始化为0.

edit2 以下是检索 numOfStrings 和 longestString 的方式

int j;
int numOfStrings=0, longestString=0, x=0;
//numOfStrings indicates total separated strings, longestString is the longest seperated string, x is the current length of the string

for (j = 0; argv[1][j] != '[=13=]'; j++) { // Traversing through the input string
    if (isAlpha(argv[1][j])) { //If the current char is a letter x is incremented by 1
        x++;
    } else if (!isAlpha(argv[1][j]) && isAlpha(argv[1][j-1])) { //If the current char is not a letter and the previous char is a letter then increment numberOfStrings by 1
        numOfStrings++;
        if (x > longestString) { //Since we hit a non letter char, if the x val is greater than the current longest string, replace longestString with x. 
            longestString = x;
        }
        x = 0;
    }

}
if(isAlpha(argv[1][j - 1])) { //Checks the last character if it is a letter and then accounts for the string associated with that letter.
    numOfStrings++;
    if(x > longestString) { // If the last string is a the largest string then this will store its length in longestString 
        longestString = x;
    }
}

edit3 我的 isAlpha 函数

int isAlpha (char a){ 

    if ((65 <= a && a <= 90) || (97 <= a && a <= 122) ) {
        return 1;
    }

    return 0;
} //Determines if a char is a letter or not using ASCII values. Returns 1 if true otherwise returns 0.

Answer 1

不知道你是怎么计算的 numOfStrings, 和 longestString, 和知道如何初始化运行索引 a、b、c，它是很难知道你在哪里得到错误：

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>

int main(int argc, char **argv)
{

    char **stringArr;

    // emulating your allocation
    stringArr = calloc(5, sizeof(char*));
    stringArr[0] = calloc(1, 100);
    stringArr[1] = calloc(1, 100);
    stringArr[2] = calloc(1, 100);
    stringArr[3] = calloc(1, 100);
    stringArr[4] = calloc(1, 100);

    if(argc != 2)
    {
        fprintf(stderr, "usage: %s arg\n", argv[0]);
        return 1;
    }

    int i = 0; // index for scanning argv
    int j = 0; // index of current stringArg buffer
    int k = 0; // index of (end of) string in stringArg[j]

    // state 0: alpha mode
    // state 1: non-alpha mode
    int state = 0;

    char c;

    while((c = argv[1][i++]))
    {
        if(isalpha(c))
        {
            if(state)
            {
                // previous character was a non-alpha
                // change state and reset indices
                state = 0;
                k = 0;
                j++;
            }

            stringArr[j][k] = c;
            stringArr[j][++k] = 0;
            continue;
        }

        // not alpha, ignoring
        state = 1;

        // if line starts with non-alpha
        if(j == 0 && i == 1)
            j--;
    }


    for(i = 0; stringArr[i][0]; ++i)
        puts(stringArr[i]);

    free(stringArr[0]);
    free(stringArr[1]);
    free(stringArr[2]);
    free(stringArr[3]);
    free(stringArr[4]);
    free(stringArr);

    return 0;
}

我决定将扫描状态存储在一个变量中。这使得 if 条件越小，越容易阅读。我的版本也处理了这个案子当该行以非字母字符开头时。

输出为：

$ valgrind ./a 'hello23ght.!good'
==20478== Memcheck, a memory error detector
==20478== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==20478== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==20478== Command: ./a hello23ght.!good
==20478== 
hello
ght
good
==20478== 
==20478== HEAP SUMMARY:
==20478==     in use at exit: 0 bytes in 0 blocks
==20478==   total heap usage: 7 allocs, 7 frees, 1,564 bytes allocated
==20478== 
==20478== All heap blocks were freed -- no leaks are possible
==20478== 
==20478== For counts of detected and suppressed errors, rerun with: -v
==20478== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

编辑

我认为我的猜测是正确的，你的运行索引的初始化 a, b 和 c 或您计算 numOfStrings 和 longestString 的方式可能是问题所在。我认为您计算 numOfStrings 和 longestString 的方式可能是错误的。但是没有代码，就不好说了。

我在我的程序中用你的替换了我的 while 循环，我删除了 printfs 在此之前，我将运行索引 a、b、c 初始化为 0。我没有改变内存分配的模拟，所以我知道有示例有足够的空间。

这是您的代码的结果：

$ valgrind ./a-ops-version 'hello23ght.!good'
==20877== Memcheck, a memory error detector
==20877== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==20877== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==20877== Command: ./a hello23ght.!good
==20877== 
hello
ght
good
==20877== 
==20877== HEAP SUMMARY:
==20877==     in use at exit: 0 bytes in 0 blocks
==20877==   total heap usage: 7 allocs, 7 frees, 1,564 bytes allocated
==20877== 
==20877== All heap blocks were freed -- no leaks are possible
==20877== 
==20877== For counts of detected and suppressed errors, rerun with: -v
==20877== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

编辑 2

我发现您的代码中存在错误：

stringArr[b][c+1] = '[=13=]';

当 b 索引最长的字符串时，这会导致错误。当这条线是执行，这意味着当前字符不是字母字符，所以在在上一个循环中，您已经将 c 增加了一个。这就是为什么当你阅读非字母 c 已经是 '[=45=]' 终止字节的索引，所以对于你写的最长字符串超出范围。

为了说明这一点

'?'是一个未初始化的字符

input: hello23ght.!good

Up until b == 0, a == 4, c == 4

hello23ght.!good
    ^
    |
    a

                                 c
                                 |
                                 v
                 0   1   2   3   4   5
               +---+---+---+---+---+---+
stringArr[b]:  | h | e | l | l | o | ? |
               +---+---+---+---+---+---+

这是

的结果

if (isAlpha(argv[1][a]) == 1) {
    stringArr[b][c] = argv[1][a];

然后你

a++; 
c++;

因此 a 更新为 5 并读取下一个字符：

input: hello23ght.!good

Up until b == 0, a == 5, c == 5

hello23ght.!good
     ^
     |
     a

                                     c
                                     |
                                     v
                 0   1   2   3   4   5
               +---+---+---+---+---+---+
stringArr[b]:  | h | e | l | l | o | ? |
               +---+---+---+---+---+---+

因为'2'是非alpha，else块被执行

} else {
    a++;
}

再次增加 a，现在是 6。

循环继续并且 else if 被评估为真，因为最后一个字符也是非字母：

stringArr[b][c+1] = '[=13=]';

被执行了，但是你写的超出了限制，因为 numOfStrings 是 5:

input: hello23ght.!good

Up until b == 0, a == 6, c == 5

hello23ght.!good
      ^
      |
      a

                                     c    c+1
                                     |     |
                                     v     v
                 0   1   2   3   4   5     6   
               +---+---+---+---+---+---+ 
stringArr[b]:  | h | e | l | l | o | ? |   beyond the bounds
               +---+---+---+---+---+---+ 

yields:
==22229== Invalid write of size 1
==22229==    at 0x108C0E: main (a.c:93)
==22229==  Address 0x51e64e6 is 0 bytes after a block of size 6 alloc'd
==22229==    at 0x4C2CF05: calloc (vg_replace_malloc.c:711)
==22229==    by 0x108A4B: main (a.c:62)
==22229==

要修复它，您必须删除 +1:

stringArr[b][c] = '[=21=]';

请注意，当且仅当非 alpha 进来时，您的算法才会工作 对，你有这个输入 hello234ght.!good，然后你将编程由于第一个 else if 而崩溃，你会增加 b，最终你超出了双指针的范围。

看看我的，用我的版本你可以拥有和你一样多的非alpha 喜欢。

我鼓励您学习和使用调试器。这些错误很容易在单步执行循环时发现，因为使用调试器你可以看到值每一步的所有指数。

最后一个较小的批评：

在你的 isAlpha 你有：

if ((65 <= a && a <= 90) || (97 <= a && a <= 122) )

这没有错，但我认为这样做是一种不好的做法，数字看起来像神奇的数字，比如你从哪里想出这个数字的？我知道这些是 a-z 和 A-Z 的 ASCII 码。

这是一个更好的做法

if (('A' <= a && a <= 'Z') || ('a' <= a && a <= 'z') )

它提高了可读性，因为您不必在 ASCII 中查找 table，

表达你的意图很明确，特别给正在review的人你的代码。

获取堆缓冲区溢出不确定是否在 C 中正确地为二维数组分配内存

Getting a Heap Buffer Overflow not sure if allocating memory for a 2d array properly in C

c

arrays

malloc

heap-memory

overflow