为什么我的 Python 代码在从文本文件中读取时打印出额外的字符“ï»¿”？

Question

try:
    data=open('info.txt')
    for each_line in data:
        try:
            (role,line_spoken)=each_line.split(':',1)
            print(role,end='')
            print(' said: ',end='')
            print(line_spoken,end='')
        except ValueError:
            print(each_line)
    data.close()
except IOError:
     print("File is missing")

逐行打印文件时，代码往往会在前面添加三个不必要的字符，即“ï»¿”。

实际输出：

ï»¿Man said:  Is this the right room for an argument?
Other Man said:  I've told you once.
Man said:  No you haven't!
Other Man said:  Yes I have.

预期输出：

Man said:  Is this the right room for an argument?
Other Man said:  I've told you once.
Man said:  No you haven't!
Other Man said:  Yes I have.

Answer 1

我找不到 Python 3 的副本，它处理编码的方式与 Python 2 不同。所以这是答案：而不是使用默认编码打开文件（是 'utf-8')，使用 'utf-8-sig'，它期望并去除 UTF-8 Byte Order Mark，这就是显示为 ï»¿.

也就是说，而不是

data = open('info.txt')

做

data = open('info.txt', encoding='utf-8-sig')

请注意，如果您使用的是 Python 2，您应该会看到例如Python, Encoding output to UTF-8 and Convert UTF-8 with BOM to UTF-8 with no BOM in Python。你需要用 codecs 或 str.decode 做一些恶作剧才能在 Python 2 中正常工作。但是在 Python 3 中，你需要做的就是设置打开文件时的 encoding= 参数。

Answer 2

我在处理 excel csv 文件时遇到了非常相似的问题。最初我从下拉选项中将我的文件保存为 .csv utf-8（逗号分隔）文件。然后我将它保存为一个 .csv（逗号分隔）文件，一切都很好。 .txt 文件可能存在类似问题

Answer 3

当我发生这种情况时，它只发生在我的 CSV 的第一行，包括读取和写入。对于我正在做的事情，我只是在第一个位置创建了一个 "sacrificial" 条目，这样这些字符就会被添加到我的牺牲条目中，而不是我关心的任何字符。 Definitley 不是一个可靠的解决方案，但速度很快，而且对我的目的有效。

为什么我的 Python 代码在从文本文件中读取时打印出额外的字符“ï»¿”？

Why does my Python code print the extra characters "ï»¿" when reading from a text file?

python

file-handling