使用 csv 填充在 .txt 文件中查找重复文件
find duplicate files in a .txt file using csv to fill
我已经尝试过这种方法,但它只会在整行相同的情况下找到重复项我想要来自特定列的重复项[1] 这些值是从 .txt 文件导入的,作为 csv 项目条目和评级标题不在 . txt 文件只有第 1 列和第 2 列,数字不在文件中只是 csv 的
996,0
996,1.67
123,0
123,8.13
456,0
456,0.00001
seen_rows =[]
duplicate_rows =[]
for row in csv.reader(in_file):
if row in seen_rows:
duplicate_rows.append(row)
print("Duplicate entry found in entry.txt file please correct this issue then run the program again. duplicates are as follows:", duplicate_rows)
else:
seen_rows.append(row)
print(seen_rows)
如果有人能指出我正确的方向,那就太好了,请随时解释解决方案的工作原理我对 python 还是很陌生,我被困在这个问题上,在此先感谢。
我找到了一种方法,如下所示,供那些想知道的人使用
with open("filename.txt or .csv", "r") as in_file: # opens the file you want to check
seen_rows =[] # empty list to store seen rows in
duplicate_rows =[] # empty list to store duplicate files in
for row in in_file: # for every row in the file do the below
columns = row.strip().split(",") # define columns in row using strip reomve spaces and split columns by csv 0,1,2,3,4,5,6 etc...
if columns[0] in seen_rows: # if column 1 value is already in seen rows then add the duplicate value inot the duplicate list
duplicate_rows.append(columns[0])
else:
seen_rows.append(columns[0]) # if value not in seen list add it to seen list
if not duplicate_rows:
##do whatever you want to do as there are no duplicate files,(if not duplicate_rows = false) as list is empty
else:
print("Duplicate number found in column 1") ##because the list is not empty (if not duplicate = True) forcing the code to use the else statement
如果有人有其他方法,请随时告诉我更好的方法。
P.S 此方法在没有 pandas 的情况下有效 如果我做出了任何不正确的假设,请更正代码 谢谢 希望这可以帮助那里的人理解这是如何工作的
我已经尝试过这种方法,但它只会在整行相同的情况下找到重复项我想要来自特定列的重复项[1] 这些值是从 .txt 文件导入的,作为 csv 项目条目和评级标题不在 . txt 文件只有第 1 列和第 2 列,数字不在文件中只是 csv 的
996,0
996,1.67
123,0
123,8.13
456,0
456,0.00001
seen_rows =[] duplicate_rows =[] for row in csv.reader(in_file): if row in seen_rows: duplicate_rows.append(row) print("Duplicate entry found in entry.txt file please correct this issue then run the program again. duplicates are as follows:", duplicate_rows) else: seen_rows.append(row) print(seen_rows)
如果有人能指出我正确的方向,那就太好了,请随时解释解决方案的工作原理我对 python 还是很陌生,我被困在这个问题上,在此先感谢。
我找到了一种方法,如下所示,供那些想知道的人使用
with open("filename.txt or .csv", "r") as in_file: # opens the file you want to check
seen_rows =[] # empty list to store seen rows in
duplicate_rows =[] # empty list to store duplicate files in
for row in in_file: # for every row in the file do the below
columns = row.strip().split(",") # define columns in row using strip reomve spaces and split columns by csv 0,1,2,3,4,5,6 etc...
if columns[0] in seen_rows: # if column 1 value is already in seen rows then add the duplicate value inot the duplicate list
duplicate_rows.append(columns[0])
else:
seen_rows.append(columns[0]) # if value not in seen list add it to seen list
if not duplicate_rows:
##do whatever you want to do as there are no duplicate files,(if not duplicate_rows = false) as list is empty
else:
print("Duplicate number found in column 1") ##because the list is not empty (if not duplicate = True) forcing the code to use the else statement
如果有人有其他方法,请随时告诉我更好的方法。
P.S 此方法在没有 pandas 的情况下有效 如果我做出了任何不正确的假设,请更正代码 谢谢 希望这可以帮助那里的人理解这是如何工作的