使用 csv 填充在 .txt 文件中查找重复文件

find duplicate files in a .txt file using csv to fill

我已经尝试过这种方法,但它只会在整行相同的情况下找到重复项我想要来自特定列的重复项[1] 这些值是从 .txt 文件导入的,作为 csv 项目条目和评级标题不在 . txt 文件只有第 1 列和第 2 列,数字不在文件中只是 csv 的

  1. 996,0

  2. 996,1.67

  3. 123,0

  4. 123,8.13

  5. 456,0

  6. 456,0.00001

     seen_rows =[]
     duplicate_rows =[]
     for row in csv.reader(in_file):
         if row in seen_rows:
             duplicate_rows.append(row)
             print("Duplicate entry found in entry.txt file please correct this issue then run the program again. duplicates are as follows:", duplicate_rows)
         else:
             seen_rows.append(row)
             print(seen_rows)
    

如果有人能指出我正确的方向,那就太好了,请随时解释解决方案的工作原理我对 python 还是很陌生,我被困在这个问题上,在此先感谢。

我找到了一种方法,如下所示,供那些想知道的人使用

    with open("filename.txt or .csv", "r") as in_file:    # opens the file you want to check
    seen_rows =[]                                         # empty list to store seen rows in
    duplicate_rows =[]                                    # empty list to store duplicate files in
    for row in in_file:                                   # for every row in the file do the below
        columns = row.strip().split(",")                  # define columns in row using strip reomve spaces and split columns by csv 0,1,2,3,4,5,6 etc...
        if columns[0] in seen_rows:                       # if column 1 value is already in seen rows then add the duplicate value inot the duplicate list
            duplicate_rows.append(columns[0])
        else:
            seen_rows.append(columns[0]) # if value not in seen list add it to seen list
    if not duplicate_rows:
        ##do whatever you want to do as there are no duplicate files,(if not duplicate_rows = false) as list is empty 
        
    else:
        print("Duplicate number found in column 1")       ##because the list is not empty (if not duplicate = True) forcing the code to use the else statement

如果有人有其他方法,请随时告诉我更好的方法。

P.S 此方法在没有 pandas 的情况下有效 如果我做出了任何不正确的假设,请更正代码 谢谢 希望这可以帮助那里的人理解这是如何工作的