在 Python 中更改文件编码方案

Change file encoding scheme in Python

我正在尝试使用 latin-1 编码打开文件,以便生成具有不同编码的文件。我收到 NameError 说明 unicode is not defined。这是我使用的一段代码:

sourceEncoding = "latin-1"
targetEncoding = "utf-8"
source = open(r'C:\Users\chsafouane\Desktop\saf.txt')
target = open(r'C:\Users\chsafouane\Desktop\saf2.txt', "w")

target.write(unicode(source.read(), sourceEncoding).encode(targetEncoding))

我根本不习惯处理文件,所以我不知道是否有我应该导入的模块来使用"unicode"

您看到 unicode not defined 这一事实表明您在 Python3。这是一个代码片段,它会生成一个 latin1 编码的文件,然后做你想做的,吞下 latin1 编码的文件并吐出一个 UTF8 编码的文件:

# Generate a latin1-encoded file

txt = u'U+00AxNBSP¡¢£¤¥¦§¨©ª«¬SHY­®¯U+00Bx°±²³´µ¶·¸¹º»¼½¾¿U+00CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏU+00DxÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßU+00ExàáâãäåæçèéêëìíîïU+00Fxðñòóôõö÷øùúûüýþÿ'

latin1 = txt.encode('latin1')

with open('example-latin1.txt', 'wb') as fid:
    fid.write(latin1)

# Read in the latin1 file

with open('example-latin1.txt', 'r', encoding='latin1') as fid:
    contents = fid.read()
assert contents == latin1.decode('latin1') # sanity check

# Spit out a UTF8-encoded file

with open('converted-utf8.txt', 'w') as fid:
    fid.write(contents)

如果您希望输出为 UTF8 以外的格式,请将 encoding 参数添加到 open,例如,

with open('converted-utf_32.txt', 'w', encoding='utf_32') as fid:
    fid.write(contents)

文档有 list of all supported codecs.