在 shellcode 中正确反汇编字符串

Question

我正在学习 shellcode。

我在教程中找到了这个 shellcode：

python -c 'print "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80 "' > shellcode

我想做的是反汇编这个非常基本的 shellcode 以了解它是如何工作的。

这是我所做的：

$ objdump -D -b binary -m i8086 shellcode 

shellcode:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:   90                      nop
   1:   90                      nop
   2:   90                      nop
   3:   90                      nop
   4:   90                      nop
   5:   90                      nop
   6:   90                      nop
   7:   90                      nop
   8:   90                      nop
   9:   31 c0                   xor    %ax,%ax
   b:   50                      push   %ax
   c:   68 2f 2f                push   [=11=]x2f2f
   f:   73 68                   jae    0x79
  11:   68 2f 62                push   [=11=]x622f
  14:   69 6e 89 e3 50          imul   [=11=]x50e3,-0x77(%bp),%bp
  19:   53                      push   %bx
  1a:   89 e1                   mov    %sp,%cx
  1c:   b0 0b                   mov    [=11=]xb,%al
  1e:   cd 80                   int    [=11=]x80

或者：

$ ndisasm shellcode 
00000000  90                nop
00000001  90                nop
00000002  90                nop
00000003  90                nop
00000004  90                nop
00000005  90                nop
00000006  90                nop
00000007  90                nop
00000008  90                nop
00000009  31C0              xor ax,ax
0000000B  50                push ax
0000000C  682F2F            push word 0x2f2f
0000000F  7368              jnc 0x79
00000011  682F62            push word 0x622f
00000014  696E89E350        imul bp,[bp-0x77],word 0x50e3
00000019  53                push bx
0000001A  89E1              mov cx,sp
0000001C  B00B              mov al,0xb
0000001E  CD80              int 0x80

此 shellcode 包含被解释为 x86 指令的字符串。有没有办法在跳跃上贴上正确的标签？

有没有办法显示字符串而不是解码字符串上的 x86 指令。我知道这并不容易，因为没有小精灵和 headers...

Answer 1

这是冯·诺依曼架构的结果。代码和数据只是计算机内存中的数字。因此，反汇编器无法知道（没有关于字节序列的任何先验信息）什么是代码，什么是数据。也就是说，你必须手动完成。

幸运的是这很容易做到。把字符串数据换成nop的(\x90)再反汇编一遍就可以了。然后你可以通过替换nop区域将字符串数据放回源代码。

还要确保您使用正确的目标 CPU 进行反汇编。我认为这个 shellcode 不太可能用于 16 位 8086 CPU.

Answer 2

如果你有使用 call 或 jmp 跳过一些数据的 shellcode，如果反汇编程序在处理数据时不同步，你必须用 NOP 替换字符串说明，正如@DavidJ 建议的那样。

在这种情况下，你只是在错误的模式下反汇编。 jnc 显然是假的（我想你已经意识到了）。

反汇编程序正在处理 push opcode (the 0x68 byte) as the start of push imm16, because that's how 16-bit mode works. But in 32 and 64-bit modes, the same opcode is the start of a push imm32。所以push指令实际上是5个字节而不是3个，下一条指令实际上是下一条push.

伪造的 short-jnc 是一个巨大的暗示，表明这是不是 16 位代码。

使用ndisasm -b32或-b64. Ndisasm可以从stdin读取输入，所以我使用python2 -c 'print "... "' | ndisasm - -b32.

使用 objdump 时，如果您更喜欢 Intel 语法，请使用 objdump -d -Mintel。所以你可以 objdump -Mintel -bbinary -D -mi386 /tmp/shellcode 用于 32 位（-mi386 选择 x86 作为体系结构（而不是 ARM 或 MIPS 或其他），并暗示 -Mi386 32 位模式也是如此）。

或者对于 64 位，objdump -D -b binary -mi386 -Mx86-64 /tmp/shellcode 有效。（objdump 不会从 stdin 中读取二进制文件：/）查看 objdump 手册页以了解有关 -M 选项的更多信息。

我在 ~/.bashrc 中使用这个别名：alias disas='objdump -drwC -Mintel'，因为我通常反汇编 ELF 可执行文件/对象以查看编译器做了什么，而不是 shellcode。您可能需要 -D 作为您的别名。

我很确定这是 32 位代码，因为在 64 位模式下，两次推送会留下间隙。不是 push imm64，而是 push imm32 是 64 位推送，立即数符号扩展为 64 位。在 64 位模式下，您可以使用

push  'abcd'
mov   [rsp+4], 'efgh'

以 rsp 指向 "abcdefgh" 结束。

此外，使用带有堆栈地址的 int 0x80 是一个很大的线索，这不是 64 位代码。 int 0x80 在 64 位模式下对 Linux 起作用，但它会将所有输入截断为 32 位：

ndisasm 的 32 位反汇编是：

00000000  90                nop
00000001  90                nop
00000002  90                nop
00000003  90                nop
00000004  90                nop
00000005  90                nop
00000006  90                nop
00000007  90                nop
00000008  90                nop
00000009  31C0              xor eax,eax
0000000B  50                push eax
0000000C  682F2F7368        push dword 0x68732f2f
00000011  682F62696E        push dword 0x6e69622f
00000016  89E3              mov ebx,esp
00000018  50                push eax
00000019  53                push ebx
0000001A  89E1              mov ecx,esp
0000001C  B00B              mov al,0xb
0000001E  CD80              int 0x80
00000020  200A              and [edx],cl

看起来很正常。它不包含任何分支，但是

Is there a way to put proper labels on jumps?

是的，Agner Fog 的 objconv 反汇编程序可以在分支目标上放置标签，以帮助您确定哪个分支去哪里。参见 How do I disassemble raw x86 code?

在 shellcode 中正确反汇编字符串

diassemble strings properly in shellcode

x86

disassembly

shellcode