如何从核心转储中反转字符串的虚拟地址？

Question

我正在尝试在进程内存中查找特定字符串。具体来说，我想找到存储它的虚拟地址。我写了一个 python 脚本来调用 gcore 进程并扫描生成的文件以查找所有匹配项。然后我调用 pmap 并遍历那里的条目。我的想法是找到每个索引对应的内存部分，然后减去前面部分的大小之和以获得正确部分中的偏移量，将其添加到基址，并获得虚拟地址。但是，当我在使用 gdb 计算的虚拟地址处搜索字符串时，我没有找到正确的字符串。为什么我的方法不起作用？ gcore 不是按顺序转储整个虚拟内存的内容吗？

#!/usr/bin/python3
import sys
import ctypes
import ctypes.util
import subprocess
import os
import ptrace
import re

if(len(sys.argv) != 2):
    print("Usage: search_and_replace.py target_pid")
    sys.exit(-1)

pid = sys.argv[1]
if not pid.isdigit():
    print("Invalid PID specified.  Make sure PID is an integer")
    sys.exit(-1)

bash_cmd = "sudo gcore -a {}".format(pid)
os.system(bash_cmd)

with open("core." + sys.argv[1], 'rb') as f:
    s = f.read()
# with open("all.dump", 'rb') as f:
#   s = f.read()

str_query = b'a random string in program\'s memory'
str_replc = b'This is an inserted string, replacing the original.'
indices = []
for match in re.finditer(str_query, s):
    indices.append(match.start())
print("number of indices is " + str(len(indices)))

#index = s.find(str_query)

# print("offset is " + str(index))
# if(index == 0):
#   print("error: String not found")
#   sys.exit(-1)

bash_cmd = "sudo pmap -x {} > maps".format(pid)
print(bash_cmd)
subprocess.call(bash_cmd, shell=True)

with open("maps") as m:
    lines = m.readlines()

#calculate the virtual address of the targeted string the running process via parsing the pmap output
pages = []
v_addrs = []

for index in indices:
    sum = 0
    offset = 0
    v_addr = 0  
    #print(index)
    for i in range(2, len(lines) - 2):
        line = lines[i]
        items = line.split()
        v_addr = int(items[0], 16)
        old_sum = sum
        sum += int(items[1]) * 1024
        if sum > index:
            offset = index - old_sum
            print("max is " + hex(v_addr + int(items[1]) * 1024))
            print("offset is " + str(offset) + " hex " + hex(offset))
            print("final va is " + hex(v_addr + offset))
            pages.append(hex(v_addr) + ", " + hex(v_addr + int(items[1]) * 1024))
            v_addrs.append(hex(v_addr + offset))
            break

print("base va is " + hex(v_addr))
v_addr += offset

for page in set(pages):
    print(page)

for va in v_addrs:
    print(va)

在相关说明中，我还尝试使用 gdb 手动扫描文件——当我使用它的 find 命令扫描中的字符串时，它似乎没有找到几乎一样多的匹配项有问题的内存区域（确切数字差异很大）。这是为什么？

Answer 1

您可以使用 python 代码来定位核心文件中的各种内容。 structer 包包含一个 elf 模块，其 Elf class 提供了相关方法。 gdb 会话的以下输出包含如何使用该代码的示例。

该会话的第一个摘录显示 gdb 打开由 gcore 生成的核心文件，并为后续搜索提供一些数据。

18:33:00 $ gdb -q /home/efuller/gnu/bin/gdb core.17856 
Reading symbols from /home/efuller/gnu/bin/gdb...done.
[New LWP 17856]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/efuller/gnu/bin/gdb /home/efuller/gnu/bin/gdb'.
Program terminated with signal SIGINT, Interrupt.
#0  0x00007ffff62c5660 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84
84  ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) backtrace
#0  0x00007ffff62c5660 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x00005555557f7ea6 in gdb_wait_for_event (block=1) at event-loop.c:772
#2  0x00005555557f7185 in gdb_do_one_event () at event-loop.c:347
#3  0x00005555557f71bd in start_event_loop () at event-loop.c:371
#4  0x00005555557f003a in captured_command_loop (data=0x0) at main.c:324
#5  0x00005555557eb2e9 in catch_errors (func=0x5555557efff8 <captured_command_loop(void*)>, func_args=0x0, errstring=0x555555b4f733 "", mask=RETURN_MASK_ALL) at exceptions.c:236
#6  0x00005555557f16e2 in captured_main (data=0x7fffffffea10) at main.c:1149
#7  0x00005555557f170b in gdb_main (args=0x7fffffffea10) at main.c:1159
#8  0x00005555555f2daa in main (argc=2, argv=0x7fffffffeb18) at gdb.c:32
(gdb) frame 6
#6  0x00005555557f16e2 in captured_main (data=0x7fffffffea10) at main.c:1149
1149              catch_errors (captured_command_loop, 0, "", RETURN_MASK_ALL);
(gdb) info locals
context = 0x7fffffffea10
argc = 2
argv = 0x7fffffffeb18
quiet = 0
set_args = 0
inhibit_home_gdbinit = 0
symarg = 0x7fffffffed8e "/home/efuller/gnu/bin/gdb"
execarg = 0x7fffffffed8e "/home/efuller/gnu/bin/gdb"
pidarg = 0x0
corearg = 0x0
pid_or_core_arg = 0x0
cdarg = 0x0
ttyarg = 0x0
print_help = 0
print_version = 0
print_configuration = 0
cmdarg_vec = 0x0
cmdarg_p = 0x0
dirarg = 0x555555fdeb80
dirsize = 1
ndir = 0
system_gdbinit = 0x0
home_gdbinit = 0x555556174960 "/home/efuller/.gdbinit"
local_gdbinit = 0x0
i = 0
save_auto_load = 1
objfile = 0x0
pre_stat_chain = 0x555555b2c000 <sentinel_cleanup>
(gdb)

下一个摘录显示 gdb 导入 python 代码，并根据局部变量的值执行两次搜索。第一次搜索显示出现该值的多个地址（symarg 和 execarg 的值在其中）。 findbytes 方法需要一个 bytes 对象，而不是 str 对象。第二次搜索只显示一个地址，其中包含第一次搜索中第一个匹配项的地址，该地址恰好在符号 table.

中有一个名称

(gdb) pi
>>> from structer import memmap, elf
>>> core = elf.Elf(memmap('core.17856'))
>>> from pprint import pprint
>>> 
(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"/home/efuller/gnu/bin/gdb")))
('0x555555fdef30',
 '0x55555606fce0',
 '0x55555614ff72',
 '0x5555562496a0',
 '0x55555624b915',
 '0x55555625f250',
 '0x5555562c6c4b',
 '0x55555689f2b5',
 '0x7ffff5f2d490',
 '0x7fffffffed74',
 '0x7fffffffed8e',
 '0x7fffffffedf0',
 '0x7fffffffefde')
(gdb) python pprint(tuple(hex(a) for a in core.findwords(0x555555fdef30)))
('0x555555faea38',)
(gdb) x/a 0x555555faea38
0x555555faea38 <_ZL16gdb_program_name>:     0x555555fdef30
(gdb)

下一个摘录显示了搜索的其他变体。搜索第一个搜索模式的 dirname 会出现多个匹配项，其中包括第一个搜索的所有匹配项。随后的搜索通过要求一个空终止符来过滤掉常见的命中，之后的搜索过滤掉不以空终止符开头的命中。最后两个搜索报告相同的结果，尽管地址相差一个，因为搜索需要在该前导空值处有一个前导空值点。

(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"/home/efuller/gnu/bin")))
('0x555555b4f701',
 '0x555555bd33f0',
 '0x555555fdef30',
 '0x55555606fce0',
 '0x55555614ff72',
 '0x5555562496a0',
 '0x55555624b915',
 '0x55555625f250',
 '0x5555562c6c4b',
 '0x55555689f2b5',
 '0x7ffff5f2d490',
 '0x7fffffffed74',
 '0x7fffffffed8e',
 '0x7fffffffedf0',
 '0x7fffffffefde')
(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"/home/efuller/gnu/bin\x00")))
('0x555555b4f701', '0x555555bd33f0')
(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"\x00/home/efuller/gnu/bin\x00")))
('0x555555b4f700', '0x555555bd33ef')
(gdb)

最后的摘录将第一次搜索的命中分为两种情况，一种是有前导空值，另一种是没有前导空值。后者使用最通用的搜索类型（findbytes 和 findwords 都依赖的搜索类型），因此它可以包含搜索模式固定部分之前的非空字符。

(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"\x00/home/efuller/gnu/bin/gdb")))
('0x555555fdef2f',
 '0x55555606fcdf',
 '0x55555624969f',
 '0x55555625f24f',
 '0x7fffffffed73',
 '0x7fffffffed8d',
 '0x7fffffffefdd')
(gdb) python import re
(gdb) python pprint(tuple(hex(a) for a in core.find(re.compile(rb"\x00[^\x00]+/home/efuller/gnu/bin/gdb"))))
('0x55555614ff6f',
 '0x55555624b8ff',
 '0x5555562c6c37',
 '0x55555689f297',
 '0x7ffff5f2d487',
 '0x7fffffffeded')
(gdb) x/s 0x55555614ff6f + 1
0x55555614ff70:     "_=/home/efuller/gnu/bin/gdb"
(gdb)

最后一个命令中的 + 1 会跳过该搜索命中中的前导空值，尽管它也可以合并到搜索代码中，如下所示。

(gdb) python pprint(tuple(hex(a+1) for a in core.find(re.compile(rb"\x00[^\x00]+/home/efuller/gnu/bin/gdb"))))
('0x55555614ff70',
 '0x55555624b900',
 '0x5555562c6c38',
 '0x55555689f298',
 '0x7ffff5f2d488',
 '0x7fffffffedee')
(gdb)

structer代码不需要gdb；它可以在 gdb 之外的 python 解释器中运行。它与 python2 不兼容，因此运行在 gdb 中使用它需要一个 gdb 二进制链接到 python3.5.

在核心文件中搜索模式可以报告 structer 代码中的搜索方法未报告的结果。有两个原因。 structer代码只查找load段，所以不会查找note段的内容，里面有各种不对应内核虚拟地址的东西。 structer 代码不会找到跨越多个负载段的结果，如果两个相邻的段有间隙（段之间的未映射区域）。该代码结合了虚拟地址space中连续的相邻段，因此搜索结果不必局限于单个段。

如何从核心转储中反转字符串的虚拟地址？

How to reverse the virtual address of string from a core dump?

python

gdb

gcore