从 LLVM IR 获取精确的 line/column 调试信息

Get precise line/column debug info from LLVM IR

我正在尝试通过行号和列号(由第三方工具报告)在 LLVM Pass 中定位指令以检测它们。为此,我正在使用 clang -g -O0 -emit-llvm 编译我的源文件,并使用以下代码在元数据中查找信息:

const DebugLoc &location = instruction->getDebugLoc();
// location.getLine()
// location.getCol()

不幸的是,此信息绝对不准确。考虑斐波那契函数的以下实现:

unsigned fib(unsigned n) {
    if (n < 2)
        return n;

    unsigned f = fib(n - 1) + fib(n - 2);
    return f;
}

我想在生成的 LLVM IR 中找到与赋值 unsigned f = ... 对应的单个 LLVM 指令。我对右侧的所有计算都不感兴趣。包含相关调试元数据的生成的 LLVM 块是:

[...]

if.end:                                           ; preds = %entry
  call void @llvm.dbg.declare(metadata !{i32* %f}, metadata !17), !dbg !18
  %2 = load i32* %n.addr, align 4, !dbg !19
  %sub = sub i32 %2, 1, !dbg !19
  %call = call i32 @fib(i32 %sub), !dbg !19
  %3 = load i32* %n.addr, align 4, !dbg !20
  %sub1 = sub i32 %3, 2, !dbg !20
  %call2 = call i32 @fib(i32 %sub1), !dbg !20
  %add = add i32 %call, %call2, !dbg !20
  store i32 %add, i32* %f, align 4, !dbg !20
  %4 = load i32* %f, align 4, !dbg !21
  store i32 %4, i32* %retval, !dbg !21
  br label %return, !dbg !21

[...]

!17 = metadata !{i32 786688, metadata !4, metadata !"f", metadata !5, i32 5, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [f] [line 5]
!18 = metadata !{i32 5, i32 11, metadata !4, null}
!19 = metadata !{i32 5, i32 15, metadata !4, null}
!20 = metadata !{i32 5, i32 28, metadata !4, null}
!21 = metadata !{i32 6, i32 2, metadata !4, null}
!22 = metadata !{i32 7, i32 1, metadata !4, null}

可以看到,store指令的元数据!dbg !20指向line 5 column 28,也就是调用fib(n - 2).更糟糕的是,加法运算和减法运算 n - 2 都指向该函数调用,由 !dbg !20 标识。

有趣的是,clang -Xclang -ast-dump -fsyntax-only 发出的 Clang AST 包含所有这些信息。因此,我怀疑它在代码生成阶段以某种方式丢失了。似乎在代码生成期间,Clang 到达某个内部序列点并将所有后续指令关联到该位置,直到发生下一个序列点(例如函数调用)。为了完整起见,这里是 AST 中的声明语句:

|-DeclStmt 0x7ffec3869f48 <line:5:2, col:38>
| `-VarDecl 0x7ffec382d680 <col:2, col:37> col:11 used f 'unsigned int' cinit
|   `-BinaryOperator 0x7ffec3869f20 <col:15, col:37> 'unsigned int' '+'
|     |-CallExpr 0x7ffec382d7e0 <col:15, col:24> 'unsigned int'
|     | |-ImplicitCastExpr 0x7ffec382d7c8 <col:15> 'unsigned int (*)(unsigned int)' <FunctionToPointerDecay>
|     | | `-DeclRefExpr 0x7ffec382d6d8 <col:15> 'unsigned int (unsigned int)' Function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)'
|     | `-BinaryOperator 0x7ffec382d778 <col:19, col:23> 'unsigned int' '-'
|     |   |-ImplicitCastExpr 0x7ffec382d748 <col:19> 'unsigned int' <LValueToRValue>
|     |   | `-DeclRefExpr 0x7ffec382d700 <col:19> 'unsigned int' lvalue ParmVar 0x7ffec382d3d0 'n' 'unsigned int'
|     |   `-ImplicitCastExpr 0x7ffec382d760 <col:23> 'unsigned int' <IntegralCast>
|     |     `-IntegerLiteral 0x7ffec382d728 <col:23> 'int' 1
|     `-CallExpr 0x7ffec3869ef0 <col:28, col:37> 'unsigned int'
|       |-ImplicitCastExpr 0x7ffec3869ed8 <col:28> 'unsigned int (*)(unsigned int)' <FunctionToPointerDecay>
|       | `-DeclRefExpr 0x7ffec3869e10 <col:28> 'unsigned int (unsigned int)' Function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)'
|       `-BinaryOperator 0x7ffec3869eb0 <col:32, col:36> 'unsigned int' '-'
|         |-ImplicitCastExpr 0x7ffec3869e80 <col:32> 'unsigned int' <LValueToRValue>
|         | `-DeclRefExpr 0x7ffec3869e38 <col:32> 'unsigned int' lvalue ParmVar 0x7ffec382d3d0 'n' 'unsigned int'
|         `-ImplicitCastExpr 0x7ffec3869e98 <col:36> 'unsigned int' <IntegralCast>
|           `-IntegerLiteral 0x7ffec3869e60 <col:36> 'int' 2

是否可以提高调试元数据的准确性,或者以不同的方式解析相应的指令?理想情况下,我希望保持 Clang 不变,即不修改并重新编译。

事实证明,这已通过 introduction of MDLocation in LLVM release 3.6.0. At the time of writing, the current clang compiler shipped with Xcode Command Line Tools still generates the former "buggy" location information, even though it's version string says Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn). After downloading the pre-built binary 修复,生成的 LLVM IR 现在如下所示:

[...]

; <label>:7                                       ; preds = %0
  call void @llvm.dbg.declare(metadata i32* %f, metadata !21, metadata !14), !dbg !22
  %8 = load i32* %2, align 4, !dbg !23
  %9 = sub i32 %8, 1, !dbg !23
  %10 = call i32 @fib(i32 %9), !dbg !24
  %11 = load i32* %2, align 4, !dbg !25
  %12 = sub i32 %11, 2, !dbg !25
  %13 = call i32 @fib(i32 %12), !dbg !26
  %14 = add i32 %10, %13, !dbg !24
  store i32 %14, i32* %f, align 4, !dbg !22
  %15 = load i32* %f, align 4, !dbg !27
  store i32 %15, i32* %1, !dbg !28
  br label %16, !dbg !28


[...]

!22 = !MDLocation(line: 5, column: 14, scope: !4)
!23 = !MDLocation(line: 5, column: 22, scope: !4)
!24 = !MDLocation(line: 5, column: 18, scope: !4)
!25 = !MDLocation(line: 5, column: 35, scope: !4)
!26 = !MDLocation(line: 5, column: 31, scope: !4)
!27 = !MDLocation(line: 6, column: 12, scope: !4)
!28 = !MDLocation(line: 6, column: 5, scope: !4)

位置元数据始终指向表达式的开头。例如,对于赋值,这是 第 5 行第 14 列 的左侧说明符 f。正如 !dbg !24 中所见,不幸的是,这可能仍然是模棱两可的。

还有一项更改:如果没有调试元数据附加到指令,访问 getLine()getColumn() 将失败。 DebugLoc class 提供了一种方便的检查方法:

const DebugLoc &location = instruction->getDebugLoc();
if (location) {
    // location.getLine()
    // location.getCol()
} else {
    // No location metadata available
}