python unicode: 写入文件时,以不同的格式写入

python unicode: when written to file, writes in different format

我正在使用 Python 3.4,将 unicode 字符串写入文件。 文件写入后,打开一看,完全是另外一组字符

代码:-

# -*- coding: utf-8 -*-

with open('test.txt', 'w', encoding='utf-8') as f:
    name = 'أبيض'
    name.encode("utf-8")
    f.write(name)
    f.close()    

f = open('test.txt','r')
for line in f.readlines():
    print(line) 

输出:-

أبيض

提前致谢

您还需要指定在阅读时使用的编解码器:

f = open('test.txt','r', encoding='utf8')
for line in f.readlines():
    print(line) 

否则将使用您的系统默认值;请参阅 open() function documentation:

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used.

根据您得到的输出判断,您的系统默认使用 Windows Codepage 1252

>>> 'أبيض'.encode('utf8').decode('cp1252')
'أبيض'

由于在阅读时使用了错误的编解码器,您创建了所谓的 Mojibake

请注意,您编写示例中的 name.encode('utf8') 行完全是多余的;该调用的 return 值被忽略,f.write(name) 调用负责实际编码。 f.close() 调用也是完全多余的,因为 with 语句已经负责关闭文件。以下将产生正确的输出:

with open('test.txt', 'w', encoding='utf-8') as f:
    name = 'أبيض'
    f.write(name)

with open('test.txt', 'r', encoding='utf-8') as f:
    for line in f.readlines():
        print(line)