openPyXL - 在取消合并期间为单元格范围分配值
openPyXL - assign value to range of cells during unmerge
所以我有 excel 个文件,每个文件都有几个 sheet,我正在编写脚本,如果文件中存在选定的 sheet,它们将从中收集数据并合并它合二为一,很大 sheet。通常它正在工作,遍历文件,如果需要 sheet 存在,它会找到包含数据的单元格范围并将其附加到数据框。我现在需要做的是将 header 行(列名)添加到 Dataframe,但在 sheet 中,这些是多行 headers.
为了使其在数据框中看起来相同,我需要取消合并顶部 header 行中的单元格并将值从第一个单元格复制到之前合并范围内的其余单元格。
我正在使用 OpenPyXL 访问 excel sheets。我的函数接收 sheet 作为唯一参数。它看起来像这样:
def checkForMergedCells(sheet):
merged = ws.merged_cell_ranges
for mergedCell in merged:
mc_start, mc_stop = str(mergedCell).split(':')
cp_value = sheet[mc_start]
sheet.unmerge_cells(mergedCell)
cell_range = sheet[mergedCell]
for cell in cell_range:
cell.value = cp_value
问题是 cell_range returns 一个最终收到错误消息的元组:
AttributeError: 'tuple' object has no attribute 'value'
Below you can see screencap during debug which shows values passed in each variable.
Debugger running
按索引访问通常会 return 元组的元组,除非您尝试获取单个单元格或行。对于编程访问,您应该使用 iter_rows()
或 iter_cols()
您可能想花些时间看看 utils
模块。
from openpyxl.utils import range_boundaries
for group in ws.merged_cell_ranges:
min_col, min_row, max_col, max_row = range_boundaries(group)
top_left_cell_value = ws.cell(row=min_row, column=min_col).value
for row in ws.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
在我这样做之前,我一直收到错误和弃用警告:
from openpyxl.utils import range_boundaries
for group in sheet.merged_cells.ranges: # merged_cell_ranges deprecated
display(range_boundaries(group._get_range_string())) # expects a string instead of an object
min_col, min_row, max_col, max_row = range_boundaries(group._get_range_string())
top_left_cell_value = sheet.cell(row=min_row, column=min_col).value
for row in sheet.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
None 之前的答案有效。
所以我详细阐述了这个,测试了它并且它对我有用。
from openpyxl.utils import range_boundaries
wb = load_workbook('Example.xlsx')
sheets = wb.sheetnames ##['Sheet1', 'Sheet2']
for i,sheet in enumerate(sheets):
ws = wb[sheets[i]]
# you need a separate list to iterate on (see explanation #2 below)
mergedcells =[]
for group in ws.merged_cells.ranges:
mergedcells.append(group)
for group in mergedcells:
min_col, min_row, max_col, max_row = group.bounds
top_left_cell_value = ws.cell(row=min_row, column=min_col).value
ws.unmerge_cells(str(group)) # you need to unmerge before writing (see explanation #1 below)
for irow in range(min_row, max_row+1):
for jcol in range(min_col, max_col+1):
ws.cell(row = irow, column = jcol, value = top_left_cell_value)
@Дмитро Олександрович 几乎是正确的,但我不得不更改一些内容来修复他的答案:
您将遇到 AttributeError: 'MergedCell' object attribute 'value' is read-only
错误,因为您需要在更改它们的值之前取消合并单元格。 (参见此处:https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1228)
您不能直接遍历 ws.merged_cells.ranges,因为要遍历 python 中的 'ranges' 列表对象并更改它(使用 unmerge_cells
函数或 pop
函数)将导致仅更改一半的对象(参见此处:https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1085)。您需要创建一个不同的列表并对其进行迭代。
来自 http://thequickblog.com/merge-unmerge-cells-openpyxl-in-python/ 的以下代码对我有用。
import openpyxl
from openpyxl.utils import range_boundaries
wbook=openpyxl.load_workbook("openpyxl_merge_unmerge.xlsx")
sheet=wbook["unmerge_sample"]
for cell_group in sheet.merged_cells.ranges:
min_col, min_row, max_col, max_row = range_boundaries(str(cell_group))
top_left_cell_value = sheet.cell(row=min_row, column=min_col).value
sheet.unmerge_cells(str(cell_group))
for row in sheet.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
wbook.save("openpyxl_merge_unmerge.xlsx")
exit()
其他答案有问题
关于 and the other answers which use the code from http://thequickblog.com/merge-unmerge-cells-openpyxl-in-python,您可以更轻松地取消合并单元格,而无需处理 range_boundaries
和那些转换。
我也遇到了所选答案的问题,其中一些合并的单元格会取消合并,而另一些则不会,而一些未合并的单元格会填充我想要的数据,而另一些则不会。
问题是 worksheet.merged_cells.ranges
是一个迭代器,这意味着它是惰性求值的,所以当调用 worksheet.unmerge_cells()
时,对象 worksheet.merged_cells
会发生变异,并且在迭代时会产生副作用再次合并单元格范围。
更好的解决方案
就我而言,我想像这样取消合并单元格,同时复制边框、字体和对齐信息:
+-------+------+
+-------+------+ | Date | Time |
| Date | Time | +=======+======+
+=======+======+ | Aug 6 | 1:00 |
| | 1:00 | -> +-------+------+
| Aug 6 | 3:00 | | Aug 6 | 3:00 |
| | 6:00 | +-------+------+
+-------+------+ | Aug 6 | 6:00 |
+-------+------+
对于当前最新版本的 openpyxl==3.0.9
,我发现以下版本最适合我:
from copy import copy
from openpyxl import load_workbook, Workbook
from openpyxl.cell import Cell
from openpyxl.worksheet.cell_range import CellRange
from openpyxl.worksheet.worksheet import Worksheet
def unmerge_and_fill_cells(worksheet: Worksheet) -> None:
"""
Unmerges all merged cells in the given ``worksheet`` and copies the content
and styling of the original cell to the newly unmerged cells.
:param worksheet: The Excel worksheet containing the merged cells.
"""
# Must convert iterator to list to eagerly evaluate all merged cell ranges
# before looping over them - this prevents unintended side-effects of
# certain cell ranges from being skipped since `worksheet.unmerge_cells()`
# is destructive.
all_merged_cell_ranges: list[CellRange] = list(
worksheet.merged_cells.ranges
)
for merged_cell_range in all_merged_cell_ranges:
merged_cell: Cell = merged_cell_range.start_cell
worksheet.unmerge_cells(range_string=merged_cell_range.coord)
# Don't need to convert iterator to list here since `merged_cell_range`
# is cached
for row_index, col_index in merged_cell_range.cells:
cell: Cell = worksheet.cell(row=row_index, column=col_index)
cell.value = merged_cell.value
# (Optional) If you want to also copy the original cell styling to
# the newly unmerged cells, you must use shallow `copy()` since
# cell style properties are proxy objects which are not hashable.
#
# See <https://openpyxl.rtfd.io/en/stable/styles.html#copying-styles>
cell.alignment = copy(merged_cell.alignment)
cell.border = copy(merged_cell.border)
cell.font = copy(merged_cell.font)
# Sample usage
if __name__ == "__main__":
workbook: Workbook = load_workbook(
filename="workbook_with_merged_cells.xlsx"
)
worksheet: Worksheet = workbook["My Sheet"]
unmerge_and_fill_cells(worksheet=worksheet)
workbook.save(filename="workbook_with_unmerged_cells.xlsx")
简洁的解决方案
这是一个没有评论也没有复制样式的较短版本:
from openpyxl.worksheet.worksheet import Worksheet
def unmerge_and_fill_cells(worksheet: Worksheet) -> None:
for merged_cell_range in list(worksheet.merged_cells.ranges):
worksheet.unmerge_cells(range_string=merged_cell_range.start_cell)
for row_col_indices in merged_cell_range.cells:
worksheet.cell(*row_col_indices).value = merged_cell.value
所以我有 excel 个文件,每个文件都有几个 sheet,我正在编写脚本,如果文件中存在选定的 sheet,它们将从中收集数据并合并它合二为一,很大 sheet。通常它正在工作,遍历文件,如果需要 sheet 存在,它会找到包含数据的单元格范围并将其附加到数据框。我现在需要做的是将 header 行(列名)添加到 Dataframe,但在 sheet 中,这些是多行 headers.
为了使其在数据框中看起来相同,我需要取消合并顶部 header 行中的单元格并将值从第一个单元格复制到之前合并范围内的其余单元格。
我正在使用 OpenPyXL 访问 excel sheets。我的函数接收 sheet 作为唯一参数。它看起来像这样:
def checkForMergedCells(sheet):
merged = ws.merged_cell_ranges
for mergedCell in merged:
mc_start, mc_stop = str(mergedCell).split(':')
cp_value = sheet[mc_start]
sheet.unmerge_cells(mergedCell)
cell_range = sheet[mergedCell]
for cell in cell_range:
cell.value = cp_value
问题是 cell_range returns 一个最终收到错误消息的元组:
AttributeError: 'tuple' object has no attribute 'value' Below you can see screencap during debug which shows values passed in each variable.
Debugger running
按索引访问通常会 return 元组的元组,除非您尝试获取单个单元格或行。对于编程访问,您应该使用 iter_rows()
或 iter_cols()
您可能想花些时间看看 utils
模块。
from openpyxl.utils import range_boundaries
for group in ws.merged_cell_ranges:
min_col, min_row, max_col, max_row = range_boundaries(group)
top_left_cell_value = ws.cell(row=min_row, column=min_col).value
for row in ws.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
在我这样做之前,我一直收到错误和弃用警告:
from openpyxl.utils import range_boundaries
for group in sheet.merged_cells.ranges: # merged_cell_ranges deprecated
display(range_boundaries(group._get_range_string())) # expects a string instead of an object
min_col, min_row, max_col, max_row = range_boundaries(group._get_range_string())
top_left_cell_value = sheet.cell(row=min_row, column=min_col).value
for row in sheet.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
None 之前的答案有效。 所以我详细阐述了这个,测试了它并且它对我有用。
from openpyxl.utils import range_boundaries
wb = load_workbook('Example.xlsx')
sheets = wb.sheetnames ##['Sheet1', 'Sheet2']
for i,sheet in enumerate(sheets):
ws = wb[sheets[i]]
# you need a separate list to iterate on (see explanation #2 below)
mergedcells =[]
for group in ws.merged_cells.ranges:
mergedcells.append(group)
for group in mergedcells:
min_col, min_row, max_col, max_row = group.bounds
top_left_cell_value = ws.cell(row=min_row, column=min_col).value
ws.unmerge_cells(str(group)) # you need to unmerge before writing (see explanation #1 below)
for irow in range(min_row, max_row+1):
for jcol in range(min_col, max_col+1):
ws.cell(row = irow, column = jcol, value = top_left_cell_value)
@Дмитро Олександрович 几乎是正确的,但我不得不更改一些内容来修复他的答案:
您将遇到
AttributeError: 'MergedCell' object attribute 'value' is read-only
错误,因为您需要在更改它们的值之前取消合并单元格。 (参见此处:https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1228)您不能直接遍历 ws.merged_cells.ranges,因为要遍历 python 中的 'ranges' 列表对象并更改它(使用
unmerge_cells
函数或pop
函数)将导致仅更改一半的对象(参见此处:https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1085)。您需要创建一个不同的列表并对其进行迭代。
来自 http://thequickblog.com/merge-unmerge-cells-openpyxl-in-python/ 的以下代码对我有用。
import openpyxl
from openpyxl.utils import range_boundaries
wbook=openpyxl.load_workbook("openpyxl_merge_unmerge.xlsx")
sheet=wbook["unmerge_sample"]
for cell_group in sheet.merged_cells.ranges:
min_col, min_row, max_col, max_row = range_boundaries(str(cell_group))
top_left_cell_value = sheet.cell(row=min_row, column=min_col).value
sheet.unmerge_cells(str(cell_group))
for row in sheet.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
wbook.save("openpyxl_merge_unmerge.xlsx")
exit()
其他答案有问题
关于 range_boundaries
和那些转换。
我也遇到了所选答案的问题,其中一些合并的单元格会取消合并,而另一些则不会,而一些未合并的单元格会填充我想要的数据,而另一些则不会。
问题是 worksheet.merged_cells.ranges
是一个迭代器,这意味着它是惰性求值的,所以当调用 worksheet.unmerge_cells()
时,对象 worksheet.merged_cells
会发生变异,并且在迭代时会产生副作用再次合并单元格范围。
更好的解决方案
就我而言,我想像这样取消合并单元格,同时复制边框、字体和对齐信息:
+-------+------+
+-------+------+ | Date | Time |
| Date | Time | +=======+======+
+=======+======+ | Aug 6 | 1:00 |
| | 1:00 | -> +-------+------+
| Aug 6 | 3:00 | | Aug 6 | 3:00 |
| | 6:00 | +-------+------+
+-------+------+ | Aug 6 | 6:00 |
+-------+------+
对于当前最新版本的 openpyxl==3.0.9
,我发现以下版本最适合我:
from copy import copy
from openpyxl import load_workbook, Workbook
from openpyxl.cell import Cell
from openpyxl.worksheet.cell_range import CellRange
from openpyxl.worksheet.worksheet import Worksheet
def unmerge_and_fill_cells(worksheet: Worksheet) -> None:
"""
Unmerges all merged cells in the given ``worksheet`` and copies the content
and styling of the original cell to the newly unmerged cells.
:param worksheet: The Excel worksheet containing the merged cells.
"""
# Must convert iterator to list to eagerly evaluate all merged cell ranges
# before looping over them - this prevents unintended side-effects of
# certain cell ranges from being skipped since `worksheet.unmerge_cells()`
# is destructive.
all_merged_cell_ranges: list[CellRange] = list(
worksheet.merged_cells.ranges
)
for merged_cell_range in all_merged_cell_ranges:
merged_cell: Cell = merged_cell_range.start_cell
worksheet.unmerge_cells(range_string=merged_cell_range.coord)
# Don't need to convert iterator to list here since `merged_cell_range`
# is cached
for row_index, col_index in merged_cell_range.cells:
cell: Cell = worksheet.cell(row=row_index, column=col_index)
cell.value = merged_cell.value
# (Optional) If you want to also copy the original cell styling to
# the newly unmerged cells, you must use shallow `copy()` since
# cell style properties are proxy objects which are not hashable.
#
# See <https://openpyxl.rtfd.io/en/stable/styles.html#copying-styles>
cell.alignment = copy(merged_cell.alignment)
cell.border = copy(merged_cell.border)
cell.font = copy(merged_cell.font)
# Sample usage
if __name__ == "__main__":
workbook: Workbook = load_workbook(
filename="workbook_with_merged_cells.xlsx"
)
worksheet: Worksheet = workbook["My Sheet"]
unmerge_and_fill_cells(worksheet=worksheet)
workbook.save(filename="workbook_with_unmerged_cells.xlsx")
简洁的解决方案
这是一个没有评论也没有复制样式的较短版本:
from openpyxl.worksheet.worksheet import Worksheet
def unmerge_and_fill_cells(worksheet: Worksheet) -> None:
for merged_cell_range in list(worksheet.merged_cells.ranges):
worksheet.unmerge_cells(range_string=merged_cell_range.start_cell)
for row_col_indices in merged_cell_range.cells:
worksheet.cell(*row_col_indices).value = merged_cell.value