openPyXL - 在取消合并期间为单元格范围分配值

openPyXL - assign value to range of cells during unmerge

所以我有 excel 个文件,每个文件都有几个 sheet,我正在编写脚本,如果文件中存在选定的 sheet,它们将从中收集数据并合并它合二为一,很大 sheet。通常它正在工作,遍历文件,如果需要 sheet 存在,它会找到包含数据的单元格范围并将其附加到数据框。我现在需要做的是将 header 行(列名)添加到 Dataframe,但在 sheet 中,这些是多行 headers.

为了使其在数据框中看起来相同,我需要取消合并顶部 header 行中的单元格并将值从第一个单元格复制到之前合并范围内的其余单元格。

我正在使用 OpenPyXL 访问 excel sheets。我的函数接收 sheet 作为唯一参数。它看起来像这样:

def checkForMergedCells(sheet):
    merged = ws.merged_cell_ranges
    for mergedCell in merged:
        mc_start, mc_stop = str(mergedCell).split(':')
        cp_value = sheet[mc_start]
        sheet.unmerge_cells(mergedCell)
        cell_range = sheet[mergedCell]
        for cell in cell_range:
            cell.value = cp_value

问题是 cell_range returns 一个最终收到错误消息的元组:

AttributeError: 'tuple' object has no attribute 'value' Below you can see screencap during debug which shows values passed in each variable.

Debugger running

按索引访问通常会 return 元组的元组,除非您尝试获取单个单元格或行。对于编程访问,您应该使用 iter_rows()iter_cols()

您可能想花些时间看看 utils 模块。

from openpyxl.utils import range_boundaries

for group in ws.merged_cell_ranges:
     min_col, min_row, max_col, max_row = range_boundaries(group)
     top_left_cell_value = ws.cell(row=min_row, column=min_col).value
     for row in ws.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
         for cell in row:
             cell.value = top_left_cell_value

在我这样做之前,我一直收到错误和弃用警告:

from openpyxl.utils import range_boundaries

for group in sheet.merged_cells.ranges: # merged_cell_ranges deprecated
    display(range_boundaries(group._get_range_string())) # expects a string instead of an object
    min_col, min_row, max_col, max_row = range_boundaries(group._get_range_string())
    top_left_cell_value = sheet.cell(row=min_row, column=min_col).value
    for row in sheet.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
        for cell in row:
            cell.value = top_left_cell_value

None 之前的答案有效。 所以我详细阐述了这个,测试了它并且它对我有用。

from openpyxl.utils import range_boundaries
wb = load_workbook('Example.xlsx')

sheets = wb.sheetnames  ##['Sheet1', 'Sheet2']
for i,sheet in enumerate(sheets):
    ws = wb[sheets[i]]
    
    # you need a separate list to iterate on (see explanation #2 below)
    mergedcells =[]  
    for group in ws.merged_cells.ranges:
        mergedcells.append(group)
    
    for group in mergedcells:
        min_col, min_row, max_col, max_row = group.bounds 
        top_left_cell_value = ws.cell(row=min_row, column=min_col).value
        ws.unmerge_cells(str(group))   # you need to unmerge before writing (see explanation #1 below)
        for irow in range(min_row, max_row+1):
            for jcol in range(min_col, max_col+1): 
                ws.cell(row = irow, column = jcol, value = top_left_cell_value)

 

@Дмитро Олександрович 几乎是正确的,但我不得不更改一些内容来修复他的答案:

  1. 您将遇到 AttributeError: 'MergedCell' object attribute 'value' is read-only 错误,因为您需要在更改它们的值之前取消合并单元格。 (参见此处:https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1228

  2. 您不能直接遍历 ws.merged_cells.ranges,因为要遍历 python 中的 'ranges' 列表对象并更改它(使用 unmerge_cells 函数或 pop 函数)将导致仅更改一半的对象(参见此处:https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1085)。您需要创建一个不同的列表并对其进行迭代。

来自 http://thequickblog.com/merge-unmerge-cells-openpyxl-in-python/ 的以下代码对我有用。

import openpyxl 
from openpyxl.utils import range_boundaries
wbook=openpyxl.load_workbook("openpyxl_merge_unmerge.xlsx")
sheet=wbook["unmerge_sample"]
for cell_group in sheet.merged_cells.ranges:
    min_col, min_row, max_col, max_row = range_boundaries(str(cell_group))
    top_left_cell_value = sheet.cell(row=min_row, column=min_col).value
    sheet.unmerge_cells(str(cell_group))
    for row in sheet.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
        for cell in row:
            cell.value = top_left_cell_value
wbook.save("openpyxl_merge_unmerge.xlsx")
exit()

其他答案有问题

关于 and the other answers which use the code from http://thequickblog.com/merge-unmerge-cells-openpyxl-in-python,您可以更轻松地取消合并单元格,而无需处理 range_boundaries 和那些转换。

我也遇到了所选答案的问题,其中一些合并的单元格会取消合并,而另一些则不会,而一些未合并的单元格会填充我想要的数据,而另一些则不会。

问题是 worksheet.merged_cells.ranges 是一个迭代器,这意味着它是惰性求值的,所以当调用 worksheet.unmerge_cells() 时,对象 worksheet.merged_cells 会发生变异,并且在迭代时会产生副作用再次合并单元格范围。

更好的解决方案

就我而言,我想像这样取消合并单元格,同时复制边框、字体和对齐信息:

                    +-------+------+
+-------+------+    | Date  | Time |
| Date  | Time |    +=======+======+
+=======+======+    | Aug 6 | 1:00 |
|       | 1:00 | -> +-------+------+
| Aug 6 | 3:00 |    | Aug 6 | 3:00 |
|       | 6:00 |    +-------+------+
+-------+------+    | Aug 6 | 6:00 |
                    +-------+------+

对于当前最新版本的 openpyxl==3.0.9,我发现以下版本最适合我:

from copy import copy

from openpyxl import load_workbook, Workbook
from openpyxl.cell import Cell
from openpyxl.worksheet.cell_range import CellRange
from openpyxl.worksheet.worksheet import Worksheet


def unmerge_and_fill_cells(worksheet: Worksheet) -> None:
    """
    Unmerges all merged cells in the given ``worksheet`` and copies the content
    and styling of the original cell to the newly unmerged cells.

    :param worksheet: The Excel worksheet containing the merged cells.
    """

    # Must convert iterator to list to eagerly evaluate all merged cell ranges
    # before looping over them - this prevents unintended side-effects of
    # certain cell ranges from being skipped since `worksheet.unmerge_cells()`
    # is destructive.
    all_merged_cell_ranges: list[CellRange] = list(
        worksheet.merged_cells.ranges
    )

    for merged_cell_range in all_merged_cell_ranges:
        merged_cell: Cell = merged_cell_range.start_cell
        worksheet.unmerge_cells(range_string=merged_cell_range.coord)

        # Don't need to convert iterator to list here since `merged_cell_range`
        # is cached
        for row_index, col_index in merged_cell_range.cells:
            cell: Cell = worksheet.cell(row=row_index, column=col_index)
            cell.value = merged_cell.value

            # (Optional) If you want to also copy the original cell styling to
            # the newly unmerged cells, you must use shallow `copy()` since
            # cell style properties are proxy objects which are not hashable.
            #
            # See <https://openpyxl.rtfd.io/en/stable/styles.html#copying-styles>
            cell.alignment = copy(merged_cell.alignment)
            cell.border = copy(merged_cell.border)
            cell.font = copy(merged_cell.font)


# Sample usage
if __name__ == "__main__":
    workbook: Workbook = load_workbook(
        filename="workbook_with_merged_cells.xlsx"
    )
    worksheet: Worksheet = workbook["My Sheet"]

    unmerge_and_fill_cells(worksheet=worksheet)
    workbook.save(filename="workbook_with_unmerged_cells.xlsx")

简洁的解决方案

这是一个没有评论也没有复制样式的较短版本:

from openpyxl.worksheet.worksheet import Worksheet

def unmerge_and_fill_cells(worksheet: Worksheet) -> None:
    for merged_cell_range in list(worksheet.merged_cells.ranges):
        worksheet.unmerge_cells(range_string=merged_cell_range.start_cell)

        for row_col_indices in merged_cell_range.cells:
            worksheet.cell(*row_col_indices).value = merged_cell.value