有效地从列表中删除重复项

Question

晚上好。我有一个包含邮政编码和相关信息的 excel 文件。这些邮政编码有很多重复项。我想通过将它们全部放在一个列表中而不重复来找出我有哪些邮政编码。此代码有效，但运行速度非常慢（超过 100 秒），并且想知道我可以做些什么来提高它的效率。

我知道每次都必须检查整个列表是否重复是导致效率低下的主要原因，但我不确定如何解决这个问题。我也知道遍历每一行可能不是最好的答案，但我还是个新手，现在被卡住了。

提前致谢。

import sys
import xlrd

loc = ("locationOfFile")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)

def findUniqueZips():
    zipsInSheet = []
    for i in range(sheet.nrows):
        if str(sheet.cell(i,0).value) in zipsInSheet:
            pass
        else:
            zipsInSheet.append(str(sheet.cell(i,0).value))
    print(zipsInSheet)

findUniqueZips()

Answer 1

我通常只是将其转换为一组。套装是你的朋友。它们比列表快得多。除非您有意需要或想要重复，否则请使用集合。

https://docs.python.org/3.7/tutorial/datastructures.html?highlight=intersection#sets

Answer 2

如果您希望避免重复，那么您绝对应该考虑在 python 中使用集合。参见 here

我会做的是创建一个集合，然后简单地将所有元素添加到一个集合中；请注意，集合是无序的、唯一的项目集合。添加所有数据后，您可以将集合中的所有元素添加到您的 sheet。因此，这避免了冗余数据。



import sys
import xlrd

loc = ("locationOfFile")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)

def findUniqueZips():
    zipsInSheet = []
    data = set()

    for i in range(sheet.nrows):
      data.add(str(sheet.cell(i,0).value)

    #now add all elements in the set to your sheet
    for i in range(len(data)):
      zipsInSheet.append(str(sheet.cell(i,0).value))
    print(zipsInSheet)

findUniqueZips()

有效地从列表中删除重复项

Efficiently removing duplicates from a list

python

performance

zipcode

xlrd