无法扫描列表中的重复项

Question

嘿，所以我想扫描这个电子邮件文本文件，如果弹出两封相同的电子邮件，我希望将其打印出来，如果列表中只有一封电子邮件，我不想将其打印出来。

它适用于不同的文本文件，但现在它说回溯错误？？？

#note make sure found.txt and list.txt are in the 'include' for pycharmfrom collect ions import Counter

print("Welcome DADDY")

with open('myheritage-1-million.txt') as f:
    c=Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
    for line in c:
        if c[line] > 1:
            print(line)

错误：

rs/dcaputo/PycharmProjects/searchtoolforrhys/venv/include/search.py
Welcome DADDY
Traceback (most recent call last):
  File "/Users/dcaputo/PycharmProjects/searchtoolforrhys/venv/include/search.py", line 5, in <module>
    c = Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/collections/__init__.py", line 566, in __init__
    self.update(*args, **kwds)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/collections/__init__.py", line 653, in update
    _count_elements(self, iterable)
  File "/Users/dcaputo/PycharmProjects/searchtoolforrhys/venv/include/search.py", line 5, in <genexpr>
    c = Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 2668: invalid continuation byte

Process finished with exit code 1

在整个文本文件中出现 2 次的所有电子邮件的列表

Answer 1

关键是最后的错误信息：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 2668: invalid continuation byte

尝试将非文本文件作为文本读取时，可能会发生此错误。您的文件可能以某种方式损坏并且包含一些无法作为文本读取的数据（位于位置 2668）。

无法扫描列表中的重复项

Trouble scanning list for duplicates

python

traceback