如何通过 openpyxl 从 Excel 文件中删除重复项

Question

我有两个 excel 列，name 和 company，我试图弄清楚什么是最简单的方法来确保获得作为输出的元组列表 (name, company) 没有任何重复项

下面的代码对我来说似乎是正确的，但由于某种原因它不起作用，可能是因为一些我似乎无法找到的愚蠢错误。

updated = openpyxl.load_workbook('abc.xlsx')
u_wb = updated.get_sheet_by_name('SP_Table')
u_names = u_wb['F'] #column F is where the names are
u_company = u_wb['C'] #column C is where the company's name are
l=[]

for x in range(len(u_names)-1):
    i=x
    i+=1
    if u_company[x].value==None #in case a field is missing
        continue
    if i==len(u_names):
        break
    for z in l:
        r=(u_names[x].value, u_names[x].value)
        if r==z:
            continue
    else:
        t=(u_names[x].value, u_company[x].value)
        l.append(t)
print("Number of contacts:", len(l))

我没有收到任何错误，联系人的数量实际上减少了，但这只是因为 u_company[x].value==None 子句。感谢任何帮助或资源

Answer 1

您试图忽略重复项的条件不正确。

您正在向列表中添加一对 (u_names[x].value, u_company[x].value)。这没关系，也有道理。问题是您正在检查 (u_names[x].value, u_names[x].value) 是否已经在列表中。

除此之外，即使是相同的，当您发现重复时，您也只是无所事事。 for 之后的 else 语句将总是执行！这是因为 for 循环之后的 else 语句发生在循环结束而没有命中 break 语句时。所以，你想要做的是：

for x in range(len(u_names)):
    if u_company[x].value==None #in case a field is missing
        continue

    r = (u_names[x].value, u_company[x].value)
    if r in l:
        continue
    else:
        l.append(t)

print("Number of contacts:", len(l))

Answer 2

openpyxl 有一个强大的功能API 可以让这种事情变得简单

contacts = set() # sets cannot contain duplicates

for row in ws.iter_rows(min_col=2, max_col=6, values_only=True):
    company = row[0]
    name = row[-1]
    if company: # adjust if necessary
        contacts.add((company, name))

print(len(contacts))

根据您要对联系人执行的操作，您可能需要使用不同的数据结构，例如字典。

如何通过 openpyxl 从 Excel 文件中删除重复项

How to eliminate duplicates from an Excel file trough openpyxl

python

excel

break

python-3.x

openpyxl