这个 8051 ASM 函数是否像我想的那样不必要地错综复杂,还是我遗漏了什么?
Is this 8051 ASM function as unnecessarily convoluted as I think, or is there something I'm missing?
我正在对我从使用 8051 微控制器的嵌入式设备转储的一些固件进行逆向工程。我遇到了这个函数,Ghidra 反汇编如下:
undefined FUN_CODE_1cff()
undefined R7:1 <RETURN>
FUN_CODE_1cff
1cff XCH A,R5
1d00 MOV A,R7
1d01 XCH A,R5
1d02 MOV A,R5
1d03 MOV R2,A
1d04 MOV A,R6
1d05 MOV R7,A
1d06 MOV A,R2
1d07 MOV R6,A
1d08 RET
所以我认为它正在做的是:
+-------------+----------------+----+----+----+----+----+
| Instruction | Explanation | A | R2 | R5 | R6 | R7 |
+-------------+----------------+----+----+----+----+----+
| | | 10 | 2 | 5 | 6 | 7 |
| XCH A,R5 | Swap A with R5 | 5 | 2 | 10 | 6 | 7 |
| MOV A,R7 | Copy R7 into A | 7 | 2 | 10 | 6 | 7 |
| XCH A,R5 | Swap A with R5 | 10 | 2 | 7 | 6 | 7 |
| MOV A,R5 | Copy R5 into A | 7 | 2 | 7 | 6 | 7 |
| MOV R2,A | Copy A into R2 | 7 | 7 | 7 | 6 | 7 |
| MOV A,R6 | Copy R6 into A | 6 | 7 | 7 | 6 | 7 |
| MOV R7,A | Copy A into R7 | 6 | 7 | 7 | 6 | 6 |
| MOV A,R2 | Copy R2 into A | 7 | 7 | 7 | 6 | 6 |
| MOV R6,A | Copy A into R6 | 7 | 7 | 7 | 7 | 6 |
| RET | Return | 7 | 7 | 7 | 7 | 6 |
+-------------+----------------+----+----+----+----+----+
但这似乎有一大堆不必要的步骤。这不是更直接吗?
+-------------+----------------+----+----+----+----+----+
| Instruction | Explanation | A | R2 | R5 | R6 | R7 |
+-------------+----------------+----+----+----+----+----+
| | | 10 | 2 | 5 | 6 | 7 |
| XCH A,R6 | Swap A with R6 | 6 | 2 | 5 | 10 | 7 |
| XCH A,R7 | Swap A with R7 | 7 | 2 | 5 | 10 | 6 |
| MOV R2,A | Copy A into R2 | 7 | 7 | 5 | 10 | 6 |
| MOV R5,A | Copy A into R5 | 7 | 7 | 7 | 10 | 6 |
| MOV R6,A | Copy A into R6 | 7 | 7 | 7 | 7 | 6 |
| RET | Return | 7 | 7 | 7 | 7 | 6 |
+-------------+----------------+----+----+----+----+----+
我假设您不能在两个 R#
寄存器之间直接 MOV
或 XCH
。能够 MOV
在这里我不认为会有什么不同,但如果你可以 XCH
,那么你可以像这样再剪掉一行:
+-------------+-----------------+----+----+----+----+----+
| Instruction | Explanation | A | R2 | R5 | R6 | R7 |
+-------------+-----------------+----+----+----+----+----+
| | | 10 | 2 | 5 | 6 | 7 |
| XCH R6,R7 | Swap R6 with R7 | 10 | 2 | 5 | 7 | 6 |
| MOV A,R6 | Copy R6 into A | 7 | 2 | 5 | 7 | 6 |
| MOV R2,A | Copy A into R2 | 7 | 7 | 5 | 7 | 6 |
| MOV R5,A | Copy A into R5 | 7 | 7 | 7 | 7 | 6 |
| RET | Return | 7 | 7 | 7 | 7 | 6 |
+-------------+-----------------+----+----+----+----+----+
话虽这么说,有谁知道为什么它可能会按原样实施?我不认为这是为了混淆代码——我从 (P87C51RB2BA) has a "read-protect" bit that can be set when it's programmed, as well as the option to have it read out encrypted code. I figure if they had any reason to want to obfuscate the code, they would have set one of these, but (thankfully) it looks like they didn't, as I was able to dump the chip in cleartext just fine. (Unless my chip is merely "functional by mistake”.) 中提取的芯片。无论如何,只混淆这一部分会很奇怪。
编辑: 我忘了提到有问题的函数在代码中的很多地方被调用;这实际上是我在这个固件中看到的最常调用的功能之一。
这是一些周边代码,从另一个看起来同样复杂的函数开始(虽然它被调用的次数不多,我也没有详细分析它)
**************************************************************
* FUNCTION *
**************************************************************
undefined FUN_CODE_1cd3()
undefined R7:1
FUN_CODE_1cd3 XREF[9]: FUN_CODE_0311:0462(c),
FUN_CODE_0311:04d8(c),
FUN_CODE_0311:0604(c),
FUN_CODE_0dd4:0dd6(c),
FUN_CODE_0f9b:0fa4(c),
FUN_CODE_0f9b:0fbd(c),
ibus_2002_handler:1009(c),
FUN_CODE_1134:1136(c),
ibus_3002_handler:116c(c)
CODE:1cd3 c8 XCH A,R0
CODE:1cd4 ef MOV A,R7
CODE:1cd5 c8 XCH A,R0
CODE:1cd6 e6 MOV A,@R0
CODE:1cd7 fe MOV R6,A
CODE:1cd8 08 INC R0
CODE:1cd9 e6 MOV A,@R0
CODE:1cda ff MOV R7,A
CODE:1cdb 12 1c ff LCALL FUN_CODE_1cff undefined FUN_CODE_1cff()
CODE:1cde 22 RET
**************************************************************
* FUNCTION *
**************************************************************
undefined FUN_CODE_1cdf()
undefined R7:1
FUN_CODE_1cdf XREF[2]: FUN_CODE_0cd4:0da7(c),
FUN_CODE_1569:158a(c)
CODE:1cdf 12 18 4d LCALL FUN_CODE_184d undefined FUN_CODE_184d()
CODE:1ce2 78 6c MOV R0,#0x6c
CODE:1ce4 e6 MOV A,@R0=>DAT_INTMEM_6c = ??
CODE:1ce5 24 05 ADD A,#0x5
CODE:1ce7 f5 2f MOV DAT_INTMEM_2f,A = ??
CODE:1ce9 22 RET
**************************************************************
* FUNCTION *
**************************************************************
undefined FUN_CODE_1cea()
undefined R7:1
FUN_CODE_1cea XREF[1]: FUN_CODE_165c:16b1(c)
CODE:1cea e5 1d MOV A,BANK3_R5 = ??
CODE:1cec 60 06 JZ LAB_CODE_1cf4
CODE:1cee 12 1c 70 LCALL FUN_CODE_1c70 undefined FUN_CODE_1c70()
CODE:1cf1 12 1b d5 LCALL FUN_CODE_1bd5 undefined FUN_CODE_1bd5()
LAB_CODE_1cf4 XREF[1]: CODE:1cec(j)
CODE:1cf4 22 RET
**************************************************************
* FUNCTION *
**************************************************************
undefined FUN_CODE_1cf5()
undefined R7:1
FUN_CODE_1cf5 XREF[3]: FUN_CODE_0311:0520(c),
FUN_CODE_0311:0611(c),
FUN_CODE_1569:15b2(c)
CODE:1cf5 e5 2d MOV A,DAT_INTMEM_2d = ??
CODE:1cf7 04 INC A
CODE:1cf8 ff MOV R7,A
CODE:1cf9 30 e7 02 JNB ACC.7,LAB_CODE_1cfe = ??
CODE:1cfc 7f 01 MOV R7,#0x1
LAB_CODE_1cfe XREF[1]: CODE:1cf9(j)
CODE:1cfe 22 RET
**************************************************************
* FUNCTION *
**************************************************************
undefined FUN_CODE_1cff()
undefined R7:1
FUN_CODE_1cff XREF[23]: FUN_CODE_0cd4:0cf2(c),
FUN_CODE_0cd4:0d17(c),
FUN_CODE_0cd4:0d23(c),
FUN_CODE_0cd4:0d2f(c),
FUN_CODE_0cd4:0d3b(c),
FUN_CODE_0cd4:0d61(c),
FUN_CODE_0dd4:0e46(c),
FUN_CODE_1068:10d9(c),
FUN_CODE_1068:10f6(c),
ibus_3004_handler:1190(c),
ibus_3004_handler:11b5(c),
ibus_3004_handler:11c1(c),
ibus_3004_handler:11cd(c),
ibus_3004_handler:11d9(c),
FUN_CODE_184d:1861(c),
FUN_CODE_184d:1886(c),
FUN_CODE_1988:1996(c),
FUN_CODE_1988:19b4(c),
FUN_CODE_1a5c:1a63(c),
FUN_CODE_1a5c:1a76(c), [more]
CODE:1cff cd XCH A,R5
CODE:1d00 ef MOV A,R7
CODE:1d01 cd XCH A,R5
CODE:1d02 ed MOV A,R5
CODE:1d03 fa MOV R2,A
CODE:1d04 ee MOV A,R6
CODE:1d05 ff MOV R7,A
CODE:1d06 ea MOV A,R2
CODE:1d07 fe MOV R6,A
CODE:1d08 22 RET
LAB_CODE_1d09 XREF[1]: CODE:1cc2(j)
CODE:1d09 7b 01 MOV R3,#0x1
CODE:1d0b 7a 00 MOV R2,#0x0
CODE:1d0d 02 1b 12 LJMP LAB_CODE_1b12
DAT_CODE_1d10 XREF[1]: start:152a(R)
CODE:1d10 01 undefined1 01h
DAT_CODE_1d11 XREF[1]: start:1538(R)
CODE:1d11 1b undefined1 1Bh
DAT_CODE_1d12 XREF[3]: start:14ed(R), start:14ff(R),
start:1547(R)
CODE:1d12 00 undefined1 00h
DAT_CODE_1d13 XREF[4]: start:14f1(R), start:14ff(R),
start:152a(R), start:154b(R)
CODE:1d13 01 undefined1 01h
DAT_CODE_1d14 XREF[3]: start:14f1(R), start:152a(R),
start:154f(R)
CODE:1d14 1f undefined1 1Fh
DAT_CODE_1d15 XREF[2]: start:152a(R), start:154f(R)
CODE:1d15 08 undefined1 08h
CODE:1d16 00 ?? 00h
编辑 2: 有问题的设备是用于 Brinks 报警面板的手持编程设备,与 my video here.
中看到的相同
通过一些猜测(见评论)我们得出的结论是
FUN_CODE_1cff
1cff XCH A,R5
1d00 MOV A,R7
1d01 XCH A,R5
1d02 MOV A,R5
1d03 MOV R2,A
1d04 MOV A,R6
1d05 MOV R7,A
1d06 MOV A,R2
1d07 MOV R6,A
1d08 RET
很可能是由某些 non-optimising 编译器生成的 little-endian 到 big-endian 的转换函数。在 C 中它看起来像:
uint16_t Convert16_BE2LE(uint16 var)
{
uint16_t retVal;
retVal = ((var & 0x00FF) << 8) | ((var & 0xFF00) >> 8);
return retVal;
}
R5、R2 被编译器用作临时寄存器。
正如我在其中一条评论中提到的,早在 2000-2005 年,我就使用 Wind River 的 ccz80 编译器,在优化方面产生了类似的结果。它几乎只是将特定的 C 构造转换为固定的操作码序列。结果,代码是正确的,但对于习惯于手动汇编编程的人来说,看起来非常不理想。不过,另一方面,这让编译器变得便宜、小巧 (~150-200K) 并且速度非常快。
我正在对我从使用 8051 微控制器的嵌入式设备转储的一些固件进行逆向工程。我遇到了这个函数,Ghidra 反汇编如下:
undefined FUN_CODE_1cff()
undefined R7:1 <RETURN>
FUN_CODE_1cff
1cff XCH A,R5
1d00 MOV A,R7
1d01 XCH A,R5
1d02 MOV A,R5
1d03 MOV R2,A
1d04 MOV A,R6
1d05 MOV R7,A
1d06 MOV A,R2
1d07 MOV R6,A
1d08 RET
所以我认为它正在做的是:
+-------------+----------------+----+----+----+----+----+
| Instruction | Explanation | A | R2 | R5 | R6 | R7 |
+-------------+----------------+----+----+----+----+----+
| | | 10 | 2 | 5 | 6 | 7 |
| XCH A,R5 | Swap A with R5 | 5 | 2 | 10 | 6 | 7 |
| MOV A,R7 | Copy R7 into A | 7 | 2 | 10 | 6 | 7 |
| XCH A,R5 | Swap A with R5 | 10 | 2 | 7 | 6 | 7 |
| MOV A,R5 | Copy R5 into A | 7 | 2 | 7 | 6 | 7 |
| MOV R2,A | Copy A into R2 | 7 | 7 | 7 | 6 | 7 |
| MOV A,R6 | Copy R6 into A | 6 | 7 | 7 | 6 | 7 |
| MOV R7,A | Copy A into R7 | 6 | 7 | 7 | 6 | 6 |
| MOV A,R2 | Copy R2 into A | 7 | 7 | 7 | 6 | 6 |
| MOV R6,A | Copy A into R6 | 7 | 7 | 7 | 7 | 6 |
| RET | Return | 7 | 7 | 7 | 7 | 6 |
+-------------+----------------+----+----+----+----+----+
但这似乎有一大堆不必要的步骤。这不是更直接吗?
+-------------+----------------+----+----+----+----+----+
| Instruction | Explanation | A | R2 | R5 | R6 | R7 |
+-------------+----------------+----+----+----+----+----+
| | | 10 | 2 | 5 | 6 | 7 |
| XCH A,R6 | Swap A with R6 | 6 | 2 | 5 | 10 | 7 |
| XCH A,R7 | Swap A with R7 | 7 | 2 | 5 | 10 | 6 |
| MOV R2,A | Copy A into R2 | 7 | 7 | 5 | 10 | 6 |
| MOV R5,A | Copy A into R5 | 7 | 7 | 7 | 10 | 6 |
| MOV R6,A | Copy A into R6 | 7 | 7 | 7 | 7 | 6 |
| RET | Return | 7 | 7 | 7 | 7 | 6 |
+-------------+----------------+----+----+----+----+----+
我假设您不能在两个 R#
寄存器之间直接 MOV
或 XCH
。能够 MOV
在这里我不认为会有什么不同,但如果你可以 XCH
,那么你可以像这样再剪掉一行:
+-------------+-----------------+----+----+----+----+----+
| Instruction | Explanation | A | R2 | R5 | R6 | R7 |
+-------------+-----------------+----+----+----+----+----+
| | | 10 | 2 | 5 | 6 | 7 |
| XCH R6,R7 | Swap R6 with R7 | 10 | 2 | 5 | 7 | 6 |
| MOV A,R6 | Copy R6 into A | 7 | 2 | 5 | 7 | 6 |
| MOV R2,A | Copy A into R2 | 7 | 7 | 5 | 7 | 6 |
| MOV R5,A | Copy A into R5 | 7 | 7 | 7 | 7 | 6 |
| RET | Return | 7 | 7 | 7 | 7 | 6 |
+-------------+-----------------+----+----+----+----+----+
话虽这么说,有谁知道为什么它可能会按原样实施?我不认为这是为了混淆代码——我从 (P87C51RB2BA) has a "read-protect" bit that can be set when it's programmed, as well as the option to have it read out encrypted code. I figure if they had any reason to want to obfuscate the code, they would have set one of these, but (thankfully) it looks like they didn't, as I was able to dump the chip in cleartext just fine. (Unless my chip is merely "functional by mistake”.) 中提取的芯片。无论如何,只混淆这一部分会很奇怪。
编辑: 我忘了提到有问题的函数在代码中的很多地方被调用;这实际上是我在这个固件中看到的最常调用的功能之一。
这是一些周边代码,从另一个看起来同样复杂的函数开始(虽然它被调用的次数不多,我也没有详细分析它)
************************************************************** * FUNCTION * ************************************************************** undefined FUN_CODE_1cd3() undefined R7:1 FUN_CODE_1cd3 XREF[9]: FUN_CODE_0311:0462(c), FUN_CODE_0311:04d8(c), FUN_CODE_0311:0604(c), FUN_CODE_0dd4:0dd6(c), FUN_CODE_0f9b:0fa4(c), FUN_CODE_0f9b:0fbd(c), ibus_2002_handler:1009(c), FUN_CODE_1134:1136(c), ibus_3002_handler:116c(c) CODE:1cd3 c8 XCH A,R0 CODE:1cd4 ef MOV A,R7 CODE:1cd5 c8 XCH A,R0 CODE:1cd6 e6 MOV A,@R0 CODE:1cd7 fe MOV R6,A CODE:1cd8 08 INC R0 CODE:1cd9 e6 MOV A,@R0 CODE:1cda ff MOV R7,A CODE:1cdb 12 1c ff LCALL FUN_CODE_1cff undefined FUN_CODE_1cff() CODE:1cde 22 RET ************************************************************** * FUNCTION * ************************************************************** undefined FUN_CODE_1cdf() undefined R7:1 FUN_CODE_1cdf XREF[2]: FUN_CODE_0cd4:0da7(c), FUN_CODE_1569:158a(c) CODE:1cdf 12 18 4d LCALL FUN_CODE_184d undefined FUN_CODE_184d() CODE:1ce2 78 6c MOV R0,#0x6c CODE:1ce4 e6 MOV A,@R0=>DAT_INTMEM_6c = ?? CODE:1ce5 24 05 ADD A,#0x5 CODE:1ce7 f5 2f MOV DAT_INTMEM_2f,A = ?? CODE:1ce9 22 RET ************************************************************** * FUNCTION * ************************************************************** undefined FUN_CODE_1cea() undefined R7:1 FUN_CODE_1cea XREF[1]: FUN_CODE_165c:16b1(c) CODE:1cea e5 1d MOV A,BANK3_R5 = ?? CODE:1cec 60 06 JZ LAB_CODE_1cf4 CODE:1cee 12 1c 70 LCALL FUN_CODE_1c70 undefined FUN_CODE_1c70() CODE:1cf1 12 1b d5 LCALL FUN_CODE_1bd5 undefined FUN_CODE_1bd5() LAB_CODE_1cf4 XREF[1]: CODE:1cec(j) CODE:1cf4 22 RET ************************************************************** * FUNCTION * ************************************************************** undefined FUN_CODE_1cf5() undefined R7:1 FUN_CODE_1cf5 XREF[3]: FUN_CODE_0311:0520(c), FUN_CODE_0311:0611(c), FUN_CODE_1569:15b2(c) CODE:1cf5 e5 2d MOV A,DAT_INTMEM_2d = ?? CODE:1cf7 04 INC A CODE:1cf8 ff MOV R7,A CODE:1cf9 30 e7 02 JNB ACC.7,LAB_CODE_1cfe = ?? CODE:1cfc 7f 01 MOV R7,#0x1 LAB_CODE_1cfe XREF[1]: CODE:1cf9(j) CODE:1cfe 22 RET ************************************************************** * FUNCTION * ************************************************************** undefined FUN_CODE_1cff() undefined R7:1 FUN_CODE_1cff XREF[23]: FUN_CODE_0cd4:0cf2(c), FUN_CODE_0cd4:0d17(c), FUN_CODE_0cd4:0d23(c), FUN_CODE_0cd4:0d2f(c), FUN_CODE_0cd4:0d3b(c), FUN_CODE_0cd4:0d61(c), FUN_CODE_0dd4:0e46(c), FUN_CODE_1068:10d9(c), FUN_CODE_1068:10f6(c), ibus_3004_handler:1190(c), ibus_3004_handler:11b5(c), ibus_3004_handler:11c1(c), ibus_3004_handler:11cd(c), ibus_3004_handler:11d9(c), FUN_CODE_184d:1861(c), FUN_CODE_184d:1886(c), FUN_CODE_1988:1996(c), FUN_CODE_1988:19b4(c), FUN_CODE_1a5c:1a63(c), FUN_CODE_1a5c:1a76(c), [more] CODE:1cff cd XCH A,R5 CODE:1d00 ef MOV A,R7 CODE:1d01 cd XCH A,R5 CODE:1d02 ed MOV A,R5 CODE:1d03 fa MOV R2,A CODE:1d04 ee MOV A,R6 CODE:1d05 ff MOV R7,A CODE:1d06 ea MOV A,R2 CODE:1d07 fe MOV R6,A CODE:1d08 22 RET LAB_CODE_1d09 XREF[1]: CODE:1cc2(j) CODE:1d09 7b 01 MOV R3,#0x1 CODE:1d0b 7a 00 MOV R2,#0x0 CODE:1d0d 02 1b 12 LJMP LAB_CODE_1b12 DAT_CODE_1d10 XREF[1]: start:152a(R) CODE:1d10 01 undefined1 01h DAT_CODE_1d11 XREF[1]: start:1538(R) CODE:1d11 1b undefined1 1Bh DAT_CODE_1d12 XREF[3]: start:14ed(R), start:14ff(R), start:1547(R) CODE:1d12 00 undefined1 00h DAT_CODE_1d13 XREF[4]: start:14f1(R), start:14ff(R), start:152a(R), start:154b(R) CODE:1d13 01 undefined1 01h DAT_CODE_1d14 XREF[3]: start:14f1(R), start:152a(R), start:154f(R) CODE:1d14 1f undefined1 1Fh DAT_CODE_1d15 XREF[2]: start:152a(R), start:154f(R) CODE:1d15 08 undefined1 08h CODE:1d16 00 ?? 00h
编辑 2: 有问题的设备是用于 Brinks 报警面板的手持编程设备,与 my video here.
中看到的相同通过一些猜测(见评论)我们得出的结论是
FUN_CODE_1cff
1cff XCH A,R5
1d00 MOV A,R7
1d01 XCH A,R5
1d02 MOV A,R5
1d03 MOV R2,A
1d04 MOV A,R6
1d05 MOV R7,A
1d06 MOV A,R2
1d07 MOV R6,A
1d08 RET
很可能是由某些 non-optimising 编译器生成的 little-endian 到 big-endian 的转换函数。在 C 中它看起来像:
uint16_t Convert16_BE2LE(uint16 var)
{
uint16_t retVal;
retVal = ((var & 0x00FF) << 8) | ((var & 0xFF00) >> 8);
return retVal;
}
R5、R2 被编译器用作临时寄存器。
正如我在其中一条评论中提到的,早在 2000-2005 年,我就使用 Wind River 的 ccz80 编译器,在优化方面产生了类似的结果。它几乎只是将特定的 C 构造转换为固定的操作码序列。结果,代码是正确的,但对于习惯于手动汇编编程的人来说,看起来非常不理想。不过,另一方面,这让编译器变得便宜、小巧 (~150-200K) 并且速度非常快。