Pandas MultiIndex Dataframe 写入时出现样式错误 Excel

Question

我正在尝试使用 pandas 样式将多索引数据框写入 excel，但出现错误。

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randn(9,4), pd.MultiIndex.from_product([['A', 'B','C'], ['r1', 'r2','r3']]), columns=[['E1','E1','E2','E2'],['d1','d2','d1','d2']])

def highlight_max(s, props=''):
    return np.where(s == np.nanmax(s.values), props, '')

def highlight_all_by_condition (value, condition, props=''):
    return np.where(value >= condition, props, '')

def highlight_max_value_by_condition(value, condition, props=''):
    return np.where(np.nanmax(value) >= condition, props, '')

df_formatted = df.style.set_properties(**{'font-family': 'Arial','font-size': '10pt'})

unique_column_list = list(set(df.columns.get_level_values(0)))
idx = pd.IndexSlice
for each in unique_column_list:
    slice_=idx[idx[each]]
    df_formatted = df_formatted.apply(highlight_max, props='color:black; font-weight: bold', axis=1, subset=slice_)\
                               .apply(highlight_all_by_condition, condition = 0.55, props='color:red;font-weight: bold; background-color: #ffe6e6', axis=1, subset=slice_)\
                               .apply(highlight_max_value_by_condition, condition = 1, props='color:green;font-weight: bold; background-color: #ffff33', axis=1, subset=slice_)

df_formatted.to_excel("test.xlsx", engine = 'openpyxl')

我收到以下错误：

ValueError: Function <function highlight_max_value_by_condition at 0x000001EE1394E940> returned the wrong shape.
Result has shape: (9,)
Expected shape:   (9, 2)

第二个样式函数（highlight_max_value_by_condition）是一个条件样式，它只在满足条件时才需要突出显示最大值，如果我删除那个样式函数，那么我什么也得不到错误。

非常感谢任何帮助。提前致谢。

Answer 1

假设我们正在寻找 highlight_max_value_by_condition 是为了将样式应用于既是子集中的最大值又满足条件的单元格，我们可以添加一个 & 来组合条件：

def highlight_max_value_by_condition(value, condition, props=''):
    return np.where(
        (value == np.nanmax(value)) & (value >= condition),
        props,
        ''
    )

然而，除此之外，我们还可以做很多事情来清理一般方法。

Styler 对象自然复合，不需要赋值回来。除了使用 list(set( 获取级别值之外，MultiIndex.levels 已经为每个级别提供了唯一值。此外，由于我们使用的是最高级别，因此我们不需要 pd.IndexSlice，因为通过顶级 MultiIndex 键访问列将提供所有子列。

所有这些加在一起意味着 df_formatted 可以像这样构建：

df_formatted = df.style.set_properties(**{
    'font-family': 'Arial',
    'font-size': '10pt'
})

for slice_ in df.columns.levels[0]:
    df_formatted.apply(
        highlight_max,
        props='color:black; font-weight: bold',
        axis=1, subset=slice_
    ).apply(
        highlight_all_by_condition, condition=0.55,
        props='color:red;font-weight: bold; background-color: #ffe6e6',
        axis=1, subset=slice_
    ).apply(
        highlight_max_value_by_condition, condition=1,
        props='color:green;font-weight: bold; background-color: #ffff33',
        axis=1, subset=slice_
    )

设置可通过 seed(6) 和修改后的函数重现

import numpy as np
import pandas as pd

np.random.seed(6)
df = pd.DataFrame(
    np.random.randn(9, 4),
    pd.MultiIndex.from_product([['A', 'B', 'C'], ['r1', 'r2', 'r3']]),
    columns=[['E1', 'E1', 'E2', 'E2'], ['d1', 'd2', 'd1', 'd2']]
)


def highlight_max(s, props=''):
    return np.where(s == np.nanmax(s.values), props, '')


def highlight_all_by_condition(value, condition, props=''):
    return np.where(value >= condition, props, '')


def highlight_max_value_by_condition(value, condition, props=''):
    return np.where(
        (value == np.nanmax(value)) & (value >= condition),
        props,
        ''
    )

df:

            E1                  E2          
            d1        d2        d1        d2
A r1 -0.311784  0.729004  0.217821 -0.899092
  r2 -2.486781  0.913252  1.127064 -1.514093
  r3  1.639291 -0.429894  2.631281  0.601822
B r1 -0.335882  1.237738  0.111128  0.129151
  r2  0.076128 -0.155128  0.634225  0.810655
  r3  0.354809  1.812590 -1.356476 -0.463632
C r1  0.824654 -1.176431  1.564490  0.712705
  r2 -0.181007  0.534200 -0.586613 -1.481853
  r3  0.857248  0.943099  0.114441 -0.021957

Pandas MultiIndex Dataframe 写入时出现样式错误 Excel

Pandas MultiIndex Dataframe Styling error when writing to Excel

multi-index

pandas

pandas-styles