并非所有重复项都从 Python 中的文本文件中删除

Question

我是 Python 的新手。我正在尝试通过执行以下操作从我的文本文件中删除重复项：

line_seen = set()

f = open('a.txt', 'r')
w = open('out.txt', 'w')

for i in f:
    if i not in line_seen:
            w.write(i)
            line_seen.add(i)

f.close()
w.close()

在我的初始文件中

hello
world
python
world
hello

在输出文件中我得到了

hello
world
python
hello

所以它没有删除最后一个重复项。谁能帮我理解为什么会这样，我该如何解决？

Answer 1

主要问题是换行符 ("\n") 出现在每行的末尾，但最后一行除外。您可以组合使用 set、map 和 join 函数，如下所示：

f = open('a.txt', 'r')
w = open('out.txt', 'w')
w.write("\n".join(list(set(map(str.strip,f.readlines())))))

out.txt

python
world
hello

如果您想坚持以前的方法，您可以使用：

line_seen = set()

f = open('a.txt', 'r')
w = open('out.txt', 'w')

for i in f:
  i = i.strip()
  if i not in line_seen:
    w.write(i)
    line_seen.add(i)

f.close()
w.close()

Answer 2

第一行可能包含 'hello\n' - 最后一行仅包含 'hello' - 它们不相同。

使用

line_seen = set()

with  open('a.txt', 'r') as f, open('out.txt', 'w') as w:

    for i in f:
        i = i.strip()            # remove the \n from line
        if i not in line_seen:
            w.write(i + "\n")
            line_seen.add(i)

Answer 3

# Since we check if the line exists in lines, we can use a list instead of
# a set to preserve order
lines = []

infile = open('a.txt', 'r')
outfile = open('out.txt', 'w')

# Use the readlines method
for line in infile.readlines():
    if line not in lines:
        # Strip whitespace
        line = line.strip()
        lines.append(line)

for line in lines:
    # Add the whitespace back
    outfile.write("{}\n".format(line))

infile.close()
outfile.close()

Answer 4

很可能你没有用换行符结束最后一行。已知行是“hello\n”。最后只是'hello'

修复输入或 strip() 读取 i

并非所有重复项都从 Python 中的文本文件中删除

Not all duplicates are deleted from a text file in Python

python

duplicates

out.txt