在 Python 中连接文本文件时出现 UnicodeEncodeError

UnicodeEncodeError when concatenating text files in Python

我是 python 初学者。 我正在尝试将所有 8 个文本文件中的文本添加(连接)到一个文本文件中以制作语料库。 但是,我收到错误 UnicodeDecodeError:'charmap' 编解码器无法解码位置 7311 中的字节 0x9d:字符映射到

 filenames = glob2.glob('Final_Corpus_SOAs/*.txt')  # list of all .txt files in the directory
 print(filenames)

输出: ['Final_Corpus_SOAs\1.txt', 'Final_Corpus_SOAs\2.txt', 'Final_Corpus_SOAs\2018 SOA Muir.txt', 'Final_Corpus_SOAs\3.txt', 'Final_Corpus_SOAs\4.txt', 'Final_Corpus_SOAs\5.txt', 'Final_Corpus_SOAs\6.txt', 'Final_Corpus_SOAs\7.txt', 'Final_Corpus_SOAs\8.txt']

with open('output.txt', 'w',encoding="utf-8") as outfile:
for fname in filenames:
    with open(fname) as infile:
        for line in infile:
            outfile.write(line)

输出: UnicodeDecodeError:'charmap' 编解码器无法解码位置 7311 中的字节 0x9d:字符映射到未定义

感谢您的帮助。

您应该在打开文件时指定编码类型。请参阅此 link 了解更多信息。因为这已经在这里回答了。

encoding="utf8" 添加到您的代码中,如下所示

with open('output.txt', 'w', encoding="utf8") as outfile:
for fname in filenames:
    with open(fname) as infile:
        for line in infile:
        outfile.write(line)

如果您确定编码方式,您应该在打开文件时声明它,无论是读写:

encoding = 'utf8'    # or 'latin1' or 'cp1252' or...

with open('output.txt', 'w',encoding=encoding) as outfile:
for fname in filenames:
    with open(fname, encoding=encoding) as infile:
        for line in infile:
            outfile.write(line)

如果您不确定或不想被编码打扰,您可以通过将文件读写为二进制来在字节级别复制文件:

with open('output.txt', 'wb') as outfile:
for fname in filenames:
    with open(fname, 'rb') as infile:
        for line in infile:
            outfile.write(line)