在 Python 中连接文本文件时出现 UnicodeEncodeError
UnicodeEncodeError when concatenating text files in Python
我是 python 初学者。
我正在尝试将所有 8 个文本文件中的文本添加(连接)到一个文本文件中以制作语料库。
但是,我收到错误
UnicodeDecodeError:'charmap' 编解码器无法解码位置 7311 中的字节 0x9d:字符映射到
filenames = glob2.glob('Final_Corpus_SOAs/*.txt') # list of all .txt files in the directory
print(filenames)
输出:
['Final_Corpus_SOAs\1.txt', 'Final_Corpus_SOAs\2.txt', 'Final_Corpus_SOAs\2018 SOA Muir.txt', 'Final_Corpus_SOAs\3.txt', 'Final_Corpus_SOAs\4.txt', 'Final_Corpus_SOAs\5.txt', 'Final_Corpus_SOAs\6.txt', 'Final_Corpus_SOAs\7.txt', 'Final_Corpus_SOAs\8.txt']
with open('output.txt', 'w',encoding="utf-8") as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
输出:
UnicodeDecodeError:'charmap' 编解码器无法解码位置 7311 中的字节 0x9d:字符映射到未定义
感谢您的帮助。
您应该在打开文件时指定编码类型。请参阅此 link 了解更多信息。因为这已经在这里回答了。
将 encoding="utf8"
添加到您的代码中,如下所示
with open('output.txt', 'w', encoding="utf8") as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
如果您确定编码方式,您应该在打开文件时声明它,无论是读写:
encoding = 'utf8' # or 'latin1' or 'cp1252' or...
with open('output.txt', 'w',encoding=encoding) as outfile:
for fname in filenames:
with open(fname, encoding=encoding) as infile:
for line in infile:
outfile.write(line)
如果您不确定或不想被编码打扰,您可以通过将文件读写为二进制来在字节级别复制文件:
with open('output.txt', 'wb') as outfile:
for fname in filenames:
with open(fname, 'rb') as infile:
for line in infile:
outfile.write(line)
我是 python 初学者。 我正在尝试将所有 8 个文本文件中的文本添加(连接)到一个文本文件中以制作语料库。 但是,我收到错误 UnicodeDecodeError:'charmap' 编解码器无法解码位置 7311 中的字节 0x9d:字符映射到
filenames = glob2.glob('Final_Corpus_SOAs/*.txt') # list of all .txt files in the directory
print(filenames)
输出: ['Final_Corpus_SOAs\1.txt', 'Final_Corpus_SOAs\2.txt', 'Final_Corpus_SOAs\2018 SOA Muir.txt', 'Final_Corpus_SOAs\3.txt', 'Final_Corpus_SOAs\4.txt', 'Final_Corpus_SOAs\5.txt', 'Final_Corpus_SOAs\6.txt', 'Final_Corpus_SOAs\7.txt', 'Final_Corpus_SOAs\8.txt']
with open('output.txt', 'w',encoding="utf-8") as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
输出: UnicodeDecodeError:'charmap' 编解码器无法解码位置 7311 中的字节 0x9d:字符映射到未定义
感谢您的帮助。
您应该在打开文件时指定编码类型。请参阅此 link 了解更多信息。因为这已经在这里回答了。
将 encoding="utf8"
添加到您的代码中,如下所示
with open('output.txt', 'w', encoding="utf8") as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
如果您确定编码方式,您应该在打开文件时声明它,无论是读写:
encoding = 'utf8' # or 'latin1' or 'cp1252' or...
with open('output.txt', 'w',encoding=encoding) as outfile:
for fname in filenames:
with open(fname, encoding=encoding) as infile:
for line in infile:
outfile.write(line)
如果您不确定或不想被编码打扰,您可以通过将文件读写为二进制来在字节级别复制文件:
with open('output.txt', 'wb') as outfile:
for fname in filenames:
with open(fname, 'rb') as infile:
for line in infile:
outfile.write(line)