如何将文件的所有重复行复制到 Python 中的新文件?
How do I copy all the duplicate lines of a file to a new file in Python?
我正在尝试编写一个代码来将一个文件的所有副本复制到一个新文件中。我编写的程序检查每行的前 3 个元素并将其与下一行进行比较。
f=open(r'C:\Users\xamer\Desktop\file.txt','r')
data=f.readlines()
f.close()
lines=data.copy()
dup=open(r'C:\Users\xamer\Desktop\duplicate.txt','a')
for x in data:
for y in data:
if (y[0]==x[0]) and (y[1]==x[1]) and (y[2]==x[2]):
lines.append(y)
else:
lines.remove(y)
dup.write(lines)
dup.close()
我收到以下错误:
Traceback (most recent call last):
File "C:\Users\xamer\Desktop\file.py", line 80, in <module>
lines.remove(y)
ValueError: list.remove(x): x not in list
有什么建议吗?
这些片段应该可以完成您要求的工作。一开始我想创建一个 duplicated_lines
列表,然后把它写在最后。但后来我意识到我可以通过动态编写重复的项目来优化代码性能,避免额外的最终循环
正如另一位用户强调的那样,如果您只想检查相邻的双重条目或独立于位置的重复项,并不是很清楚
第一种情况 - 重复紧随其后 - 这是代码:
# opening the source file
with open('hello.txt','r') as f:
# returns a list containing the original lines
data=f.readlines()
# creating the file to host the repeated lines
with open('duplicated.txt','a') as f:
for i in range(0, len(data)-1):
# stripping to avoid a bug if the last line is a repeated item
if(data[i].strip('\n') == data[i+1].strip('\n')):
print("Lines {}: {}".format(i, data[i]))
print("Lines {}: {}".format(i+1, data[i+1]))
#duplicated_lines.append(data[i])
print("Line repeated: " + data[i])
f.write("%s\n" % data[i])
如果您想检查整个文件中的重复行,代码如下:
# opening the source file
with open('hello.txt','r') as f:
# returns a list containing the original lines
data=f.readlines()
# creating the file to host the repeated lines
with open('duplicated.txt','a') as f:
for i in range(0, len(data)-1):
for j in range(i+1, len(data)):
# stripping to avoid a bug if the last line is a repeated item
if(data[i].strip('\n') == data[j].strip('\n')):
print("Lines {}: {}".format(i, data[i]))
print("Lines {}: {}".format(j, data[j]))
#duplicated_lines.append(data[i])
print("Line repeated: " + data[i])
f.write("%s\n" % data[i])
我正在尝试编写一个代码来将一个文件的所有副本复制到一个新文件中。我编写的程序检查每行的前 3 个元素并将其与下一行进行比较。
f=open(r'C:\Users\xamer\Desktop\file.txt','r')
data=f.readlines()
f.close()
lines=data.copy()
dup=open(r'C:\Users\xamer\Desktop\duplicate.txt','a')
for x in data:
for y in data:
if (y[0]==x[0]) and (y[1]==x[1]) and (y[2]==x[2]):
lines.append(y)
else:
lines.remove(y)
dup.write(lines)
dup.close()
我收到以下错误:
Traceback (most recent call last):
File "C:\Users\xamer\Desktop\file.py", line 80, in <module>
lines.remove(y)
ValueError: list.remove(x): x not in list
有什么建议吗?
这些片段应该可以完成您要求的工作。一开始我想创建一个 duplicated_lines
列表,然后把它写在最后。但后来我意识到我可以通过动态编写重复的项目来优化代码性能,避免额外的最终循环
正如另一位用户强调的那样,如果您只想检查相邻的双重条目或独立于位置的重复项,并不是很清楚
第一种情况 - 重复紧随其后 - 这是代码:
# opening the source file
with open('hello.txt','r') as f:
# returns a list containing the original lines
data=f.readlines()
# creating the file to host the repeated lines
with open('duplicated.txt','a') as f:
for i in range(0, len(data)-1):
# stripping to avoid a bug if the last line is a repeated item
if(data[i].strip('\n') == data[i+1].strip('\n')):
print("Lines {}: {}".format(i, data[i]))
print("Lines {}: {}".format(i+1, data[i+1]))
#duplicated_lines.append(data[i])
print("Line repeated: " + data[i])
f.write("%s\n" % data[i])
如果您想检查整个文件中的重复行,代码如下:
# opening the source file
with open('hello.txt','r') as f:
# returns a list containing the original lines
data=f.readlines()
# creating the file to host the repeated lines
with open('duplicated.txt','a') as f:
for i in range(0, len(data)-1):
for j in range(i+1, len(data)):
# stripping to avoid a bug if the last line is a repeated item
if(data[i].strip('\n') == data[j].strip('\n')):
print("Lines {}: {}".format(i, data[i]))
print("Lines {}: {}".format(j, data[j]))
#duplicated_lines.append(data[i])
print("Line repeated: " + data[i])
f.write("%s\n" % data[i])