OpenPyXL - 如何根据某些条件从 Excel 文件中删除行?
OpenPyXL - How to delete rows from an Excel file based on some condition?
我想从 excel 文件中删除行,知道这些行的值。我使用 openpyxl:
key_values_list
是带有数字的列表(所有都存在于 excel 文件中的列中)
wb = load_workbook(src)
sheet = wb['Sheet 1']
for i in range(2, sheet.max_row + 1):
if sheet.cell(row=i, column=1).value in key_values_list:
sheet.delete_rows(i, 1)
wb.save(src)
上面的代码并没有删除所有对应的行
在 for 循环中删除直接元素总是会遇到问题。考虑具有 12 行的 sheet 及其各自的行值的代码:
for i in range(1, sh.max_row + 1):
print(sh.cell(row=i, column=1).value)
# 1 .. 12
现在看看当你开始删除东西时会发生什么:
for i in range(1, sh.max_row + 1):
if sh.cell(row=i, column=1).value in [5,6,7]:
sh.delete_rows(i, 1)
print(f'i = {i}\tcell value (i, 1) is {sh.cell(row=i, column=1).value}')
# i = 1 cell value (i, 1) is 1
# i = 2 cell value (i, 1) is 2
# i = 3 cell value (i, 1) is 3
# i = 4 cell value (i, 1) is 4
# i = 5 cell value (i, 1) is 5
# i = 6 cell value (i, 1) is 7
# i = 7 cell value (i, 1) is 9
# i = 8 cell value (i, 1) is 10
# i = 9 cell value (i, 1) is 11
# i = 10 cell value (i, 1) is 12
# i = 11 cell value (i, 1) is None
# i = 12 cell value (i, 1) is None
可以看到在i in [5, 6, 7]
期间,行的移动是从第6行开始的,因为第5行已经被删除,使得原来的第6行成为新的第5行,原来的第7行成为新的第6行。 .. 等。所以在 i = 6
的下一次迭代中,单元格实际上引用了原始数据中第 7 行的值。您有效地跳过了对第 6 行的迭代。
最简单的答案是使用 while
循环,而不是 for
:
i = 1
while i <= sh.max_row:
print(f'i = {i}\tcell value (i, 1) is {sh.cell(row=i, column=1).value}')
if sh.cell(row=i, column=1).value in [5,6,7]:
sh.delete_rows(i, 1)
# Note the absence of incremental. Because we deleted a row, we want to stay on the same row because new data will show in the next iteration.
else:
i += 1
# Because the check failed, we can safely increment to the next row.
# i = 1 cell value (i, 1) is 1
# i = 2 cell value (i, 1) is 2
# i = 3 cell value (i, 1) is 3
# i = 4 cell value (i, 1) is 4
# i = 5 cell value (i, 1) is 5 # deleted
# i = 5 cell value (i, 1) is 6 # deleted
# i = 5 cell value (i, 1) is 7 # deleted
# i = 5 cell value (i, 1) is 8
# i = 6 cell value (i, 1) is 9
# i = 7 cell value (i, 1) is 10
# i = 8 cell value (i, 1) is 11
# i = 9 cell value (i, 1) is 12
# verify the data has been deleted
for i in range(1, sh.max_row +1):
print(sh.cell(row=i, column=1).value)
# 1
# 2
# 3
# 4
# 8
# 9
# 10
# 11
# 12
您现在可以看到虽然 i
没有达到 12,但每一行都被迭代了,因为 i=5
已经处理了三次。
如果出于某种原因您必须使用for
循环进行迭代,您可能需要考虑一些alternative methods such as iterating through a copy or doing it backward
另一种方法是反转循环。由于删除的行会混淆您的索引,因此颠倒顺序是有意义的:
rows = list(sheet.iter_rows(min_row=1, max_row=sheet.max_row))
for row in reversed(rows):
if row[0].row == 1:
break
if row[5].value != filterBy:
sheet.delete_rows(row[0].row, 1)
我想从 excel 文件中删除行,知道这些行的值。我使用 openpyxl:
key_values_list
是带有数字的列表(所有都存在于 excel 文件中的列中)
wb = load_workbook(src)
sheet = wb['Sheet 1']
for i in range(2, sheet.max_row + 1):
if sheet.cell(row=i, column=1).value in key_values_list:
sheet.delete_rows(i, 1)
wb.save(src)
上面的代码并没有删除所有对应的行
在 for 循环中删除直接元素总是会遇到问题。考虑具有 12 行的 sheet 及其各自的行值的代码:
for i in range(1, sh.max_row + 1):
print(sh.cell(row=i, column=1).value)
# 1 .. 12
现在看看当你开始删除东西时会发生什么:
for i in range(1, sh.max_row + 1):
if sh.cell(row=i, column=1).value in [5,6,7]:
sh.delete_rows(i, 1)
print(f'i = {i}\tcell value (i, 1) is {sh.cell(row=i, column=1).value}')
# i = 1 cell value (i, 1) is 1
# i = 2 cell value (i, 1) is 2
# i = 3 cell value (i, 1) is 3
# i = 4 cell value (i, 1) is 4
# i = 5 cell value (i, 1) is 5
# i = 6 cell value (i, 1) is 7
# i = 7 cell value (i, 1) is 9
# i = 8 cell value (i, 1) is 10
# i = 9 cell value (i, 1) is 11
# i = 10 cell value (i, 1) is 12
# i = 11 cell value (i, 1) is None
# i = 12 cell value (i, 1) is None
可以看到在i in [5, 6, 7]
期间,行的移动是从第6行开始的,因为第5行已经被删除,使得原来的第6行成为新的第5行,原来的第7行成为新的第6行。 .. 等。所以在 i = 6
的下一次迭代中,单元格实际上引用了原始数据中第 7 行的值。您有效地跳过了对第 6 行的迭代。
最简单的答案是使用 while
循环,而不是 for
:
i = 1
while i <= sh.max_row:
print(f'i = {i}\tcell value (i, 1) is {sh.cell(row=i, column=1).value}')
if sh.cell(row=i, column=1).value in [5,6,7]:
sh.delete_rows(i, 1)
# Note the absence of incremental. Because we deleted a row, we want to stay on the same row because new data will show in the next iteration.
else:
i += 1
# Because the check failed, we can safely increment to the next row.
# i = 1 cell value (i, 1) is 1
# i = 2 cell value (i, 1) is 2
# i = 3 cell value (i, 1) is 3
# i = 4 cell value (i, 1) is 4
# i = 5 cell value (i, 1) is 5 # deleted
# i = 5 cell value (i, 1) is 6 # deleted
# i = 5 cell value (i, 1) is 7 # deleted
# i = 5 cell value (i, 1) is 8
# i = 6 cell value (i, 1) is 9
# i = 7 cell value (i, 1) is 10
# i = 8 cell value (i, 1) is 11
# i = 9 cell value (i, 1) is 12
# verify the data has been deleted
for i in range(1, sh.max_row +1):
print(sh.cell(row=i, column=1).value)
# 1
# 2
# 3
# 4
# 8
# 9
# 10
# 11
# 12
您现在可以看到虽然 i
没有达到 12,但每一行都被迭代了,因为 i=5
已经处理了三次。
如果出于某种原因您必须使用for
循环进行迭代,您可能需要考虑一些alternative methods such as iterating through a copy or doing it backward
另一种方法是反转循环。由于删除的行会混淆您的索引,因此颠倒顺序是有意义的:
rows = list(sheet.iter_rows(min_row=1, max_row=sheet.max_row))
for row in reversed(rows):
if row[0].row == 1:
break
if row[5].value != filterBy:
sheet.delete_rows(row[0].row, 1)