内联汇编后指针取消引用 (SIGSEGV) 失败

Failure of pointer dereferencing (SIGSEGV) after inline-assembly

在尝试用不同的方法对 Schoenhage 基数转换树的叶子进行编码时,我偶然发现了编译器(GCC、clang)优化除以一个小常数的除法和倒数乘法的问题。正如他们应该的那样,没有抱怨。所以我决定添加一些内联汇编以获得可比较的基准测试,但我得到的却是段错误。

代码(不是最简单的示例,但一些上下文可能会有所帮助)

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

#define ARRAY_LENGTH 33

static const char digits[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8',
                               '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
                               'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
                               'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
                               'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
                               'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
                               's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '+',
                               '/' };

static void to_radix_recursive(unsigned int a, unsigned int b,
                               char *result, int *index) {
   unsigned int r = 0u, q = 0u;
   int i;
   char c;

   if (a == 0u) {
      return;
   }
#ifdef CZ_USE_ASM
   //printf("BEFORE a = %u, b = %u, q = %u, r = %u\n", a, b, q, r);
   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
          );
   //printf("AFTER  a = %u, b = %u, q = %u, r = %u\n\n", a, b, q, r);
#else
   q = a / b;
   r = a % b;
#endif
   to_radix_recursive(q, b, result, index);
   c = digits[r];
   i = *index;        /* Line 41 */
   result[i] = c;
   (*index)++;
}

int main(void) {
   int idx;
   unsigned int a, b;
   char result[ARRAY_LENGTH] = {'[=10=]'};

   /* All checks and balances ommitted! */

   /* 0 < a <= UINT_MAX */
   a = 1234567;
#ifdef CZ_USE_CONSTANT
   /* Most compilers optimimize to multiplication with reciprocal here */
   b = 10;
#else
   /* Should press the optimizer to use "divl" */
   for (b = 2u; b < 64u; b++) {
      idx = 0;
      to_radix_recursive(a, b, result, &idx);
      printf("Result recursive = %s for radix %u\n", result,b);
      for (int i = 0; i < ARRAY_LENGTH; i++) {
         result[i] = '[=10=]';
      }
   }
#endif
   exit(EXIT_SUCCESS);
}

-DCZ_USE_ASM

的预期输出
Result recursive = 100101101011010000111 for radix 2
Result recursive = 2022201111201 for radix 3
Result recursive = 10231122013 for radix 4
Result recursive = 304001232 for radix 5
Result recursive = 42243331 for radix 6
Result recursive = 13331215 for radix 7
Result recursive = 4553207 for radix 8
Result recursive = 2281451 for radix 9
Result recursive = 1234567 for radix 10
Result recursive = 773604 for radix 11
Result recursive = 4B6547 for radix 12
Result recursive = 342C19 for radix 13
Result recursive = 241CB5 for radix 14
Result recursive = 195BE7 for radix 15
Result recursive = 12D687 for radix 16
Result recursive = ED4EA for radix 17
Result recursive = BDC71 for radix 18
Result recursive = 98IG4 for radix 19
Result recursive = 7E687 for radix 20
Result recursive = 6769J for radix 21
Result recursive = 55KGF for radix 22
Result recursive = 49AHJ for radix 23
Result recursive = 3H787 for radix 24
Result recursive = 3407H for radix 25
Result recursive = 2I679 for radix 26
Result recursive = 28JDJ for radix 27
Result recursive = 206JJ for radix 28
Result recursive = 1LHS8 for radix 29
Result recursive = 1FLM7 for radix 30
Result recursive = 1ADKN for radix 31
Result recursive = 15LK7 for radix 32
Result recursive = 11BM4 for radix 33
Result recursive = VDWR for radix 34
Result recursive = SRSC for radix 35
Result recursive = QGLJ for radix 36
Result recursive = ODTP for radix 37
Result recursive = MIaN for radix 38
Result recursive = KVQM for radix 39
Result recursive = JBO7 for radix 40
Result recursive = HbHG for radix 41
Result recursive = GRaJ for radix 42
Result recursive = FMTb for radix 43
Result recursive = ELUF for radix 44
Result recursive = DOTb for radix 45
Result recursive = CVKJ for radix 46
Result recursive = BffI for radix 47
Result recursive = B7e7 for radix 48
Result recursive = AO9C for radix 49
Result recursive = 9hfH for radix 50
Result recursive = 9FXA for radix 51
Result recursive = 8eTZ for radix 52
Result recursive = 8FQc for radix 53
Result recursive = 7jKJ for radix 54
Result recursive = 7N6b for radix 55
Result recursive = 71bl for radix 56
Result recursive = 6bu4 for radix 57
Result recursive = 6Ivb for radix 58
Result recursive = 60cp for radix 59
Result recursive = 5gu7 for radix 60
Result recursive = 5Qln for radix 61
Result recursive = 5BAN for radix 62
Result recursive = 4x3J for radix 63

但是如上所述:它反而会出现段错误。我把代码分开一点,让每行有一个操作, 运行 valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./divmod 打印出

==9546== Memcheck, a memory error detector
==9546== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9546== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9546== Command: ./divmod
==9546== 
==9546== Invalid read of size 4
==9546==    at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==  Address 0x2 is not stack'd, malloc'd or (recently) free'd
==9546== 
==9546== 
==9546== Process terminating with default action of signal 11 (SIGSEGV)
==9546==  Access not within mapped region at address 0x2
==9546==    at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==  If you believe this happened as a result of a stack
==9546==  overflow in your program's main thread (unlikely but
==9546==  possible), you can try to increase the size of the
==9546==  main thread stack using the --main-stacksize= flag.
==9546==  The main thread stack size used in this run was 8388608.
==9546== 
==9546== HEAP SUMMARY:
==9546==     in use at exit: 0 bytes in 0 blocks
==9546==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==9546== 
==9546== All heap blocks were freed -- no leaks are possible

地址 0x2 非常低,提示指针解引用失败,确实如此。

保留对 printf 的两次调用,原因您可能已经猜到:如果您使用一个(我昨天需要同时使用),它就可以工作。这几乎总是由某处的某些 UB(未定义行为)引起的。

隐隐怕自己犯了一个很傻的错误,会被大家嘲笑:到底是什么原因,如何修复?

问题是在内联汇编中你做了:

   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
          );

GCC/CLANG 非常无情。如果你修改了一个寄存器,你需要告诉编译器它将被修改。在此内联汇编代码中,您已经说过 EAXEDX 是仅输出寄存器(它们将被修改)但您没有告诉编译器你modified/clobberedEBX。一个简单的解决方法是将 EBX 添加到 clobber 列表中,如下所示:

   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
           : "ebx"
          );

现在编译器不会假设 EBX 仍然包含与内联汇编代码 运行.

之前相同的值

如果您的内联汇编以 MOV 指令开头,您可能采取了错误的方法,未使用内联汇编操作数(和约束)本身来允许编译器尝试生成最有效的版本代码。您的内联程序集可能看起来像这样:

   __asm__("divl %4"
           : "=a"(q), "=d"(r)
           : "a"(a), "d"(0), "r"(b)
          );

我们创建第 5 个操作数以传递编译器选择的寄存器中的除数。我们还在操作数中将 EDX 设置为零,而不是在内联汇编中这样做。此版本还为输入和输出操作数重用了 EAXEDX 寄存器,需要可能使用的寄存器更少。