如何读取 Python 中的解释数据字符串？

Question

我想从 Python 文件中读取所有字符串。示例文件 (/tmp/s.py):

s = '{\x7f5  x'

现在我尝试从我的脚本中读取字符串：

import re
find_str = re.compile(r"'(.+?)'")

for line in open('/tmp/s.py', 'r'):
    all_strings = find_str.findall(line)
    print(all_strings) # outputs ['{\x7f5  x']

但我希望字符串（在本例中为转义十六进制表示形式的字节）不被转义。我想处理数据是否在我的 /tmp/s.py 文件中，并得到一个带有解释 \x7f 字节的字符串，而不是现在表示为 \\x7f 的文字 \x7f。

我该怎么做？

Answer 1

您将使用 unicode_escape 编解码器以与 Python 读取字符串文字时相同的方式解码字符串：

print(*[s.encode('latin1').decode('unicode_escape') for s in all_strings])

注意unicode_escape只能从字节解码，不能从文本解码。编解码器也仅限于 Latin-1 源代码，而不是默认的 UTF-8。

来自 Python codecs 模块的 Text Encodings section：

unicode_escape

Encoding suitable as the contents of a Unicode literal in ASCII-encoded Python source code, except that quotes are not escaped. Decodes from Latin-1 source code. Beware that Python source code actually uses UTF-8 by default.

演示：

>>> s = r'{\x7f5  x'
>>> s
'{\x7f5  x'
>>> s.encode('latin1').decode('unicode_escape')
'{\x7f5  x'

如何读取 Python 中的解释数据字符串？

How to read interpreted data strings in Python?

python

hex

byte