内联汇编后指针取消引用 (SIGSEGV) 失败
Failure of pointer dereferencing (SIGSEGV) after inline-assembly
在尝试用不同的方法对 Schoenhage 基数转换树的叶子进行编码时,我偶然发现了编译器(GCC、clang)优化除以一个小常数的除法和倒数乘法的问题。正如他们应该的那样,没有抱怨。所以我决定添加一些内联汇编以获得可比较的基准测试,但我得到的却是段错误。
代码(不是最简单的示例,但一些上下文可能会有所帮助)
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#define ARRAY_LENGTH 33
static const char digits[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8',
'9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '+',
'/' };
static void to_radix_recursive(unsigned int a, unsigned int b,
char *result, int *index) {
unsigned int r = 0u, q = 0u;
int i;
char c;
if (a == 0u) {
return;
}
#ifdef CZ_USE_ASM
//printf("BEFORE a = %u, b = %u, q = %u, r = %u\n", a, b, q, r);
__asm__("xorl %%edx, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"divl %%ebx;"
: "=a"(q), "=d"(r)
: "g"(a), "g"(b)
);
//printf("AFTER a = %u, b = %u, q = %u, r = %u\n\n", a, b, q, r);
#else
q = a / b;
r = a % b;
#endif
to_radix_recursive(q, b, result, index);
c = digits[r];
i = *index; /* Line 41 */
result[i] = c;
(*index)++;
}
int main(void) {
int idx;
unsigned int a, b;
char result[ARRAY_LENGTH] = {'[=10=]'};
/* All checks and balances ommitted! */
/* 0 < a <= UINT_MAX */
a = 1234567;
#ifdef CZ_USE_CONSTANT
/* Most compilers optimimize to multiplication with reciprocal here */
b = 10;
#else
/* Should press the optimizer to use "divl" */
for (b = 2u; b < 64u; b++) {
idx = 0;
to_radix_recursive(a, b, result, &idx);
printf("Result recursive = %s for radix %u\n", result,b);
for (int i = 0; i < ARRAY_LENGTH; i++) {
result[i] = '[=10=]';
}
}
#endif
exit(EXIT_SUCCESS);
}
-DCZ_USE_ASM
的预期输出
Result recursive = 100101101011010000111 for radix 2
Result recursive = 2022201111201 for radix 3
Result recursive = 10231122013 for radix 4
Result recursive = 304001232 for radix 5
Result recursive = 42243331 for radix 6
Result recursive = 13331215 for radix 7
Result recursive = 4553207 for radix 8
Result recursive = 2281451 for radix 9
Result recursive = 1234567 for radix 10
Result recursive = 773604 for radix 11
Result recursive = 4B6547 for radix 12
Result recursive = 342C19 for radix 13
Result recursive = 241CB5 for radix 14
Result recursive = 195BE7 for radix 15
Result recursive = 12D687 for radix 16
Result recursive = ED4EA for radix 17
Result recursive = BDC71 for radix 18
Result recursive = 98IG4 for radix 19
Result recursive = 7E687 for radix 20
Result recursive = 6769J for radix 21
Result recursive = 55KGF for radix 22
Result recursive = 49AHJ for radix 23
Result recursive = 3H787 for radix 24
Result recursive = 3407H for radix 25
Result recursive = 2I679 for radix 26
Result recursive = 28JDJ for radix 27
Result recursive = 206JJ for radix 28
Result recursive = 1LHS8 for radix 29
Result recursive = 1FLM7 for radix 30
Result recursive = 1ADKN for radix 31
Result recursive = 15LK7 for radix 32
Result recursive = 11BM4 for radix 33
Result recursive = VDWR for radix 34
Result recursive = SRSC for radix 35
Result recursive = QGLJ for radix 36
Result recursive = ODTP for radix 37
Result recursive = MIaN for radix 38
Result recursive = KVQM for radix 39
Result recursive = JBO7 for radix 40
Result recursive = HbHG for radix 41
Result recursive = GRaJ for radix 42
Result recursive = FMTb for radix 43
Result recursive = ELUF for radix 44
Result recursive = DOTb for radix 45
Result recursive = CVKJ for radix 46
Result recursive = BffI for radix 47
Result recursive = B7e7 for radix 48
Result recursive = AO9C for radix 49
Result recursive = 9hfH for radix 50
Result recursive = 9FXA for radix 51
Result recursive = 8eTZ for radix 52
Result recursive = 8FQc for radix 53
Result recursive = 7jKJ for radix 54
Result recursive = 7N6b for radix 55
Result recursive = 71bl for radix 56
Result recursive = 6bu4 for radix 57
Result recursive = 6Ivb for radix 58
Result recursive = 60cp for radix 59
Result recursive = 5gu7 for radix 60
Result recursive = 5Qln for radix 61
Result recursive = 5BAN for radix 62
Result recursive = 4x3J for radix 63
但是如上所述:它反而会出现段错误。我把代码分开一点,让每行有一个操作, 运行 valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./divmod
打印出
==9546== Memcheck, a memory error detector
==9546== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9546== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9546== Command: ./divmod
==9546==
==9546== Invalid read of size 4
==9546== at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== Address 0x2 is not stack'd, malloc'd or (recently) free'd
==9546==
==9546==
==9546== Process terminating with default action of signal 11 (SIGSEGV)
==9546== Access not within mapped region at address 0x2
==9546== at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== If you believe this happened as a result of a stack
==9546== overflow in your program's main thread (unlikely but
==9546== possible), you can try to increase the size of the
==9546== main thread stack using the --main-stacksize= flag.
==9546== The main thread stack size used in this run was 8388608.
==9546==
==9546== HEAP SUMMARY:
==9546== in use at exit: 0 bytes in 0 blocks
==9546== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==9546==
==9546== All heap blocks were freed -- no leaks are possible
地址 0x2
非常低,提示指针解引用失败,确实如此。
保留对 printf
的两次调用,原因您可能已经猜到:如果您使用一个(我昨天需要同时使用),它就可以工作。这几乎总是由某处的某些 UB(未定义行为)引起的。
隐隐怕自己犯了一个很傻的错误,会被大家嘲笑:到底是什么原因,如何修复?
问题是在内联汇编中你做了:
__asm__("xorl %%edx, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"divl %%ebx;"
: "=a"(q), "=d"(r)
: "g"(a), "g"(b)
);
GCC/CLANG 非常无情。如果你修改了一个寄存器,你需要告诉编译器它将被修改。在此内联汇编代码中,您已经说过 EAX 和 EDX 是仅输出寄存器(它们将被修改)但您没有告诉编译器你modified/clobberedEBX。一个简单的解决方法是将 EBX 添加到 clobber 列表中,如下所示:
__asm__("xorl %%edx, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"divl %%ebx;"
: "=a"(q), "=d"(r)
: "g"(a), "g"(b)
: "ebx"
);
现在编译器不会假设 EBX 仍然包含与内联汇编代码 运行.
之前相同的值
如果您的内联汇编以 MOV
指令开头,您可能采取了错误的方法,未使用内联汇编操作数(和约束)本身来允许编译器尝试生成最有效的版本代码。您的内联程序集可能看起来像这样:
__asm__("divl %4"
: "=a"(q), "=d"(r)
: "a"(a), "d"(0), "r"(b)
);
我们创建第 5 个操作数以传递编译器选择的寄存器中的除数。我们还在操作数中将 EDX 设置为零,而不是在内联汇编中这样做。此版本还为输入和输出操作数重用了 EAX 和 EDX 寄存器,需要可能使用的寄存器更少。
在尝试用不同的方法对 Schoenhage 基数转换树的叶子进行编码时,我偶然发现了编译器(GCC、clang)优化除以一个小常数的除法和倒数乘法的问题。正如他们应该的那样,没有抱怨。所以我决定添加一些内联汇编以获得可比较的基准测试,但我得到的却是段错误。
代码(不是最简单的示例,但一些上下文可能会有所帮助)
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#define ARRAY_LENGTH 33
static const char digits[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8',
'9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '+',
'/' };
static void to_radix_recursive(unsigned int a, unsigned int b,
char *result, int *index) {
unsigned int r = 0u, q = 0u;
int i;
char c;
if (a == 0u) {
return;
}
#ifdef CZ_USE_ASM
//printf("BEFORE a = %u, b = %u, q = %u, r = %u\n", a, b, q, r);
__asm__("xorl %%edx, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"divl %%ebx;"
: "=a"(q), "=d"(r)
: "g"(a), "g"(b)
);
//printf("AFTER a = %u, b = %u, q = %u, r = %u\n\n", a, b, q, r);
#else
q = a / b;
r = a % b;
#endif
to_radix_recursive(q, b, result, index);
c = digits[r];
i = *index; /* Line 41 */
result[i] = c;
(*index)++;
}
int main(void) {
int idx;
unsigned int a, b;
char result[ARRAY_LENGTH] = {'[=10=]'};
/* All checks and balances ommitted! */
/* 0 < a <= UINT_MAX */
a = 1234567;
#ifdef CZ_USE_CONSTANT
/* Most compilers optimimize to multiplication with reciprocal here */
b = 10;
#else
/* Should press the optimizer to use "divl" */
for (b = 2u; b < 64u; b++) {
idx = 0;
to_radix_recursive(a, b, result, &idx);
printf("Result recursive = %s for radix %u\n", result,b);
for (int i = 0; i < ARRAY_LENGTH; i++) {
result[i] = '[=10=]';
}
}
#endif
exit(EXIT_SUCCESS);
}
-DCZ_USE_ASM
Result recursive = 100101101011010000111 for radix 2
Result recursive = 2022201111201 for radix 3
Result recursive = 10231122013 for radix 4
Result recursive = 304001232 for radix 5
Result recursive = 42243331 for radix 6
Result recursive = 13331215 for radix 7
Result recursive = 4553207 for radix 8
Result recursive = 2281451 for radix 9
Result recursive = 1234567 for radix 10
Result recursive = 773604 for radix 11
Result recursive = 4B6547 for radix 12
Result recursive = 342C19 for radix 13
Result recursive = 241CB5 for radix 14
Result recursive = 195BE7 for radix 15
Result recursive = 12D687 for radix 16
Result recursive = ED4EA for radix 17
Result recursive = BDC71 for radix 18
Result recursive = 98IG4 for radix 19
Result recursive = 7E687 for radix 20
Result recursive = 6769J for radix 21
Result recursive = 55KGF for radix 22
Result recursive = 49AHJ for radix 23
Result recursive = 3H787 for radix 24
Result recursive = 3407H for radix 25
Result recursive = 2I679 for radix 26
Result recursive = 28JDJ for radix 27
Result recursive = 206JJ for radix 28
Result recursive = 1LHS8 for radix 29
Result recursive = 1FLM7 for radix 30
Result recursive = 1ADKN for radix 31
Result recursive = 15LK7 for radix 32
Result recursive = 11BM4 for radix 33
Result recursive = VDWR for radix 34
Result recursive = SRSC for radix 35
Result recursive = QGLJ for radix 36
Result recursive = ODTP for radix 37
Result recursive = MIaN for radix 38
Result recursive = KVQM for radix 39
Result recursive = JBO7 for radix 40
Result recursive = HbHG for radix 41
Result recursive = GRaJ for radix 42
Result recursive = FMTb for radix 43
Result recursive = ELUF for radix 44
Result recursive = DOTb for radix 45
Result recursive = CVKJ for radix 46
Result recursive = BffI for radix 47
Result recursive = B7e7 for radix 48
Result recursive = AO9C for radix 49
Result recursive = 9hfH for radix 50
Result recursive = 9FXA for radix 51
Result recursive = 8eTZ for radix 52
Result recursive = 8FQc for radix 53
Result recursive = 7jKJ for radix 54
Result recursive = 7N6b for radix 55
Result recursive = 71bl for radix 56
Result recursive = 6bu4 for radix 57
Result recursive = 6Ivb for radix 58
Result recursive = 60cp for radix 59
Result recursive = 5gu7 for radix 60
Result recursive = 5Qln for radix 61
Result recursive = 5BAN for radix 62
Result recursive = 4x3J for radix 63
但是如上所述:它反而会出现段错误。我把代码分开一点,让每行有一个操作, 运行 valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./divmod
打印出
==9546== Memcheck, a memory error detector
==9546== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9546== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9546== Command: ./divmod
==9546==
==9546== Invalid read of size 4
==9546== at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== Address 0x2 is not stack'd, malloc'd or (recently) free'd
==9546==
==9546==
==9546== Process terminating with default action of signal 11 (SIGSEGV)
==9546== Access not within mapped region at address 0x2
==9546== at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== If you believe this happened as a result of a stack
==9546== overflow in your program's main thread (unlikely but
==9546== possible), you can try to increase the size of the
==9546== main thread stack using the --main-stacksize= flag.
==9546== The main thread stack size used in this run was 8388608.
==9546==
==9546== HEAP SUMMARY:
==9546== in use at exit: 0 bytes in 0 blocks
==9546== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==9546==
==9546== All heap blocks were freed -- no leaks are possible
地址 0x2
非常低,提示指针解引用失败,确实如此。
保留对 printf
的两次调用,原因您可能已经猜到:如果您使用一个(我昨天需要同时使用),它就可以工作。这几乎总是由某处的某些 UB(未定义行为)引起的。
隐隐怕自己犯了一个很傻的错误,会被大家嘲笑:到底是什么原因,如何修复?
问题是在内联汇编中你做了:
__asm__("xorl %%edx, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"divl %%ebx;"
: "=a"(q), "=d"(r)
: "g"(a), "g"(b)
);
GCC/CLANG 非常无情。如果你修改了一个寄存器,你需要告诉编译器它将被修改。在此内联汇编代码中,您已经说过 EAX 和 EDX 是仅输出寄存器(它们将被修改)但您没有告诉编译器你modified/clobberedEBX。一个简单的解决方法是将 EBX 添加到 clobber 列表中,如下所示:
__asm__("xorl %%edx, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"divl %%ebx;"
: "=a"(q), "=d"(r)
: "g"(a), "g"(b)
: "ebx"
);
现在编译器不会假设 EBX 仍然包含与内联汇编代码 运行.
之前相同的值如果您的内联汇编以 MOV
指令开头,您可能采取了错误的方法,未使用内联汇编操作数(和约束)本身来允许编译器尝试生成最有效的版本代码。您的内联程序集可能看起来像这样:
__asm__("divl %4"
: "=a"(q), "=d"(r)
: "a"(a), "d"(0), "r"(b)
);
我们创建第 5 个操作数以传递编译器选择的寄存器中的除数。我们还在操作数中将 EDX 设置为零,而不是在内联汇编中这样做。此版本还为输入和输出操作数重用了 EAX 和 EDX 寄存器,需要可能使用的寄存器更少。