使用 Python 2.7.10 解码 ASCII 字符串

Question

我是 Python 的新手，所以我可能还在犯很多新手错误。

我在Python中比较了两个看似匹配的字符串，但它总是返回false。当我检查对象的表示时，我发现其中一个字符串是用 ASCII 编码的。

第一个字符串的表示returns：

'\x00"\x00i\x00t\x00i\x00n\x00e\x00r\x00a\x00r\x00y\x00_\x00o\x00p\x00t\x00i\x00o\x00n\x00s\x00_\x00s\x00e\x00a\x00r\x00c\x00h\x00_\x00b\x00u\x00t\x00t\x00o\x00n\x00"\x00 \x00=\x00 \x00"\x00L\x00a\x00u\x00n\x00c\x00h\x00 \x00t\x00h\x00e\x00 \x00s\x00e\x00a\x00r\x00c\x00h\x00"\x00;\x00'

而第二个字符串的表示returns：

"itinerary_options_search_button" = "Launch the search";

我想弄清楚如何解码第一个字符串以获得第二个字符串，以便我对两者的比较匹配。当我用

解码第一个字符串时

string.decode('ascii')

我得到一个 unicode 对象。我不确定如何获取解码后的字符串。

Answer 1

您的第一个字符串似乎有问题。我不完全确定为什么会有这么多空字符 (\x00)，但无论哪种方式，我们都可以编写一个函数来清理它们：

s_1 = '\x00"\x00i\x00t\x00i\x00n\x00e\x00r\x00a\x00r\x00y\x00_\x00o\x00p\x00t\x00i\x00o\x00n\x00s\x00_\x00s\x00e\x00a\x00r\x00c\x00h\x00_\x00b\x00u\x00t\x00t\x00o\x00n\x00"\x00 \x00=\x00 \x00"\x00L\x00a\x00u\x00n\x00c\x00h\x00 \x00t\x00h\x00e\x00 \x00s\x00e\x00a\x00r\x00c\x00h\x00"\x00;\x00'
s_2 = '"itinerary_options_search_button" = "Launch the search";'

def null_cleaner(string):
    new_string = ""
    for char in string:
        if char != "\x00":
            new_string += char
    return new_string

print(null_cleaner(s_1) == null_cleaner(s_2))

一种不太可靠的方法是简单地拼接字符串以删除所有其他字符（恰好是 \x00）：

s_1 = '\x00"\x00i\x00t\x00i\x00n\x00e\x00r\x00a\x00r\x00y\x00_\x00o\x00p\x00t\x00i\x00o\x00n\x00s\x00_\x00s\x00e\x00a\x00r\x00c\x00h\x00_\x00b\x00u\x00t\x00t\x00o\x00n\x00"\x00 \x00=\x00 \x00"\x00L\x00a\x00u\x00n\x00c\x00h\x00 \x00t\x00h\x00e\x00 \x00s\x00e\x00a\x00r\x00c\x00h\x00"\x00;\x00'
s_2 = '"itinerary_options_search_button" = "Launch the search";'

print(s_1[1::2] == s_2)

Answer 2

... encoded in ASCII.
[lots of NULs]

没有。

>>> '\x00"\x00i\x00t\x00i\x00n\x00e\x00r\x00a\x00r\x00y'.decode('utf-16be')
u'"itinerary'

当然，您的数据有一个额外的 NUL 会破坏它。一旦你清理它，你应该能够毫无问题地解码它。

使用 Python 2.7.10 解码 ASCII 字符串

Decode a ASCII string with Python 2.7.10

python

ascii

decoding

python-2.7

string-decoding