如何将包含 unicode escape \u#### 的字符串转换为 utf-8 字符串
How to convert string containing unicode escape \u#### to utf-8 string
我从早上开始就在尝试这个。
我的sample.txt
choice = \u9078\u629e
代码:
with open('sample.txt', encoding='utf-8') as f:
for line in f:
print(line)
print("選択" in line)
print(line.encode('utf-8').decode('utf-8'))
print(line.encode().decode('utf-8'))
print(line.encode('utf-8').decode())
print(line.encode().decode('unicode-escape').encode("latin-1").decode('utf-8')) # as suggested.
out:
choice = \u9078\u629e
False
choice = \u9078\u629e
choice = \u9078\u629e
choice = \u9078\u629e
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 9-10: ordinal not in range(256)
当我在 ipython qtconsole 中执行此操作时:
In [29]: "choice = \u9078\u629e"
Out[29]: 'choice = 選択'
所以问题是如何读取包含 unicode 转义字符串的文本文件,如 \u9078\u629e
(我不知道它到底叫什么)并将其转换为 utf-8,如 選択
?
如果是从文件中读取,打开时给出编码即可:
with open('test.txt', encoding='unicode-escape') as f:
a = f.read()
print(a)
# choice = 選択
test.txt
包含:
choice = \u9078\u629e
如果您的文本已经在字符串中,您可以像这样转换它:
a = "choice = \u9078\u629e"
a.encode().decode('unicode-escape')
# 'choice = 選択'
我从早上开始就在尝试这个。
我的sample.txt
choice = \u9078\u629e
代码:
with open('sample.txt', encoding='utf-8') as f:
for line in f:
print(line)
print("選択" in line)
print(line.encode('utf-8').decode('utf-8'))
print(line.encode().decode('utf-8'))
print(line.encode('utf-8').decode())
print(line.encode().decode('unicode-escape').encode("latin-1").decode('utf-8')) # as suggested.
out:
choice = \u9078\u629e
False
choice = \u9078\u629e
choice = \u9078\u629e
choice = \u9078\u629e
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 9-10: ordinal not in range(256)
当我在 ipython qtconsole 中执行此操作时:
In [29]: "choice = \u9078\u629e"
Out[29]: 'choice = 選択'
所以问题是如何读取包含 unicode 转义字符串的文本文件,如 \u9078\u629e
(我不知道它到底叫什么)并将其转换为 utf-8,如 選択
?
如果是从文件中读取,打开时给出编码即可:
with open('test.txt', encoding='unicode-escape') as f:
a = f.read()
print(a)
# choice = 選択
test.txt
包含:
choice = \u9078\u629e
如果您的文本已经在字符串中,您可以像这样转换它:
a = "choice = \u9078\u629e"
a.encode().decode('unicode-escape')
# 'choice = 選択'