解码 Python 字符串中的特定转义字符

Question

我有一个 Python 变量（名为 var），其中包含一个具有以下文字数据的字符串：

day\r\n\night

十六进制为：

64  61  79  5C  72  5C  6E  5C  5C  6E  69  67  68  74  07
d   a   y   \   r   \   n   \   \   n   i   g   h   t   BEL

我只需要解码 \、\r 和 \n .

所需的输出（十六进制）：

64  61  79  0D  0A  5C  6E  69  67  68  74  07
d   a   y   CR  LF  \   n   i   g   h   t   BEL

使用 decode 无效：

>>> print(var.decode('ascii'))
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

使用正则表达式查找 \、\r 和 \n 并用它们的转义值替换是不成功的，因为 \night 中的 \n 被处理作为 0x0A.

是否可以指定我想要的字符decode，或者是否有更合适的模块？我正在使用 Python 3.10.2.

Answer 1

查找类似问题。根据这个你可以做以下事情

var = r"day\r\n\night"

# This is what you got previously
var.encode('ascii').hex()
# '64 61 79 5c 72 5c 6e 5c 5c 6e 69 67 68 74'

# To get required output do this
bytes(var, encoding='ascii').decode('unicode-escape').encode('ascii').hex()
# '64 61 79 0d 0a 5c 6e 69 67 68 74'

Answer 2

假设 var 是这样的字符串：

64617905C725C6E5C5C6E69676877407（没有空格）

你应该试试：

i = 0
escaped = {'72': '0D', '6E': '0A', '5C': '5C'}
while i < len(var):
   if var[i:i+2] == '5C':                # checks if the caracter is a '\'
      i += 2                             # if yes, goes to next character hex code in var
      var[i-2:i+2] = escaped[var[i:i+2]] # replaces the '5Cxx' by its escaped value
   i += 2

它将\r \n \替换为对应的字符(CR LF \)。

我稍后会在 day\r\l\night 和 64617905C725C6E5C5C6E696768774 之间添加转换器。

编辑： 转换器来了！每次转换后的字符串都是r.
它处理 input() 的结果，但对于 hard-coded 字符串，您必须输入：
var = 'day\r\l\\night'
这样代码就会将其理解为 'day'，然后是 '\'，然后是 'r'，然后是 '\'，然后是 'n'，然后是 '\'，然后是 ' \'，然后是 'night' 而不是 'day'，然后是 CR，然后是 LF，然后是 '\'，然后是 'night'；所以
print(var)
将打印
day\r\n\night
而不是

day
\night

# convert string to hex
r = ''
for c in var:
   t = hex(ord(c))[2:]
   if ord(c) < 16: t = '0' + t
   r += t

# convert hex to string
r = ''
c = 0
while c < len(var):
   # transforms each hex code point into a decimal number
   # I kind of cheat using `eval`. But don't worry. Doesn't matter.
   # anyway, it then adds the corresponding character to `r`.
   r += eval('chr(0x' + var[c:c+2] + ')') # does like, `r += chr(0x5C)` for example.
   c += 2

Answer 3

非常感谢所有提供答案的人，但其中 none 似乎完全解决了我的问题。经过长时间的研究，我发现 this solution from sahil Kothiya (mirror) -- 我对其进行了修改以解决我的具体问题：

import re, codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \[\nr]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

空闲演示：

Notepad++ 中显示的特殊字符：

输出字符串的十六进制转储：

它甚至可以处理 Unicode 字符（我的脚本的一个重要组成部分）。

空闲演示：

Notepad++ 中显示的特殊字符：

输出字符串的十六进制转储：

解码 Python 字符串中的特定转义字符

Decoding specific escaped characters in a Python string

python

ascii

escaping