wireshark lua string:byte() 错误

Question

我在编写 lua 解析器时遇到字符串问题。我的包裹看起来像：

0000   00 00 00 69 00 10 00 01 00 00 00 ed 00 00 00 0c
0010   bf a6 5f ...

调试时，tvb看起来一样

偏移量 0x10 处的字节是 0xbf，但是在我的解析器函数中我得到了不同的结果，这里是我的代码：

local str = buf(0x10):string()
local x = string.byte(str, 1)

变量x应该是0xbf，但它是0xef，还有一些其他偏移量也是 0xef:

local str = buf(0x11):string()
local x = string.byte(str, 1) -- also get 0xef, should be 0xa6

local str = buf(11):string()
local x = string.byte(str, 1) -- also get 0xef, should be 0xed

似乎大值总是会得到 0xef 作为结果，比如 0xa6/0xbf/0xed...

小值也是正确的，例如 0x69/0x5f/0x0c...

我使用的是最新的 wireshark 2.0，这是一个错误吗？

Answer 1

我不太了解 Wireshark，但我很清楚发生了什么。

您正在使用 Wireshark 的 tvbrange:string([encoding]) 功能。我在 Wireshark 网站上找到的文档说默认编码是 ENC_ASCII。 0x80-0xFF 范围内的字节（您报告的问题）不是有效的 ASCII。

Wireshark 可能正在做的是将它们转换为 U+FFFD，Unicode 的 "Replacement Character"。这是在 Unicode 字符串中表示未知字符的标准做法。

然后，Wireshark 可能在返回 Lua 时将此字符串编码为 UTF-8。 U+FFFD的UTF-8编码的第一个字节是0xEF，所以你看到的就是这样。

如果您想从 TVB 获取原始字节值，可以尝试使用 tvbrange:bytes([encoding]) 函数来获取值。例如

local bytes = buf(0x10):bytes()
local x = bytes:get_index(0) -- maybe 1, I'm not sure if it would be 0 or 1 indexed

也可能有一些编码可以传递给 tvbrange:string，它可以满足您的需要，但我找不到任何好的参考。

Answer 2

假设 buf 引用传递给您的剖析例程的参数，它是 Tvb. When you call it (as in, buf(0x10)), you create a TvbRange 实例类型。它们都记录在这里： https://www.wireshark.org/docs/wsdg_html_chunked/lua_module_Tvb.html

tehtmi 指出了您获得错误结果的原因，tvbrange:string() returns 是一个使用 ASCII 编码的字符串（因为省略了编码参数）。

获取原始字节缓冲区（而不是将其转换为 ASCII 或 UTF-8 字符串）的方法是：

local x = buf:raw(0x10, 1)

（使用偏移量 16 和长度 1。）

如果您考虑过直接使用 buf(0x10):raw()，请注意，出于某种原因，这将 return 支持此 Tvb 的完整数据源。可能是错误或功能...解决方法：

local bytes = buf(0x10)
local x = bytes:raw(bytes:offset(), bytes:len())

wireshark lua string:byte() 错误

wireshark lua string:byte() error

lua

wireshark