如何从数据框中提取值以用于条件格式，同时将其一次应用于某些 select 类别或数据条目？

Question

within the image 我正在尝试计算中位数和标准差，但它允许我一次只计算一列我想要一次计算所有三列然后将数据存储到另一个数据框中我想知道如何使用条件格式中指定的值并将其仅应用于相应的类别

Dataframe df9 DF for Median

Answer 1

IIUC 您想使用单个 groupby 对 Month1 Month2 和 Month3 列执行计算吗？例如，您在代码底部的 mean - std 和 mean + std？

您可以通过以下方式做到这一点：

np.random.seed(87) # Use this for reproducibility

df9 = pd.DataFrame(np.random.randint(0,3,(10,5)),\
      columns = ['Month1','Month2','Month3','Revised Category','useless column'])

agg = df9[['Month1','Month2','Month3','Revised Category']].\
      groupby('Revised Category').agg(lambda x: [x.mean()-x.std(),x.mean()+x.std()])

agg = pd.concat([pd.DataFrame.from_items(zip(agg.index, agg[col].values)).T for col in agg],axis=1)
agg.columns = ['Month1_low','Month1_up','Month2_low','Month2_up','Month3_low','Month3_up']

输出为：

    Month1_low  Month1_up   Month2_low  Month2_up   Month3_low  Month3_up
0   -0.414214   2.414214    -0.414214   2.414214    -0.207107   1.207107
1   -0.207107   1.207107    -0.207107   1.207107    -0.207107   1.207107
2   0.183475    2.149859    0.105573    1.894427    0.663340    2.336660

这个新 agg 数据框中的索引代表您的类别。因此，如果你想访问，对类别 0 说 Month1_up，只需切片：agg.loc[0,'Month1_up'].

如果您想一次计算所有月份的平均值，您可以在 groupby 之前连接原始数据框，如下所示：

concatenated = pd.concat([df9[[col,'Revised Category']].rename({col:'Month'},axis = 1) \
                          for col in ['Month1','Month2','Month3']])
concatenated.groupby('Revised Category').agg(lambda x: [x.mean()-x.std(),x.mean()+x.std()])
agg2 = pd.DataFrame.from_items(zip(agg2.index, agg2.Month.values)).T
agg2.columns = ['Months_low','Months_up']

    Months_low  Months_up
0   -0.149859   1.816525
1   -0.047723   1.047723
2   0.344018    2.100426

编辑：

我不习惯为数据帧着色，所以这个解决方案可能很繁重且不是最理想的，但它在一个例子上有效。

首先，让我们重新组合来自 df9 的原始数据和我们用 groupby 计算的数据（有上限和下限）：

months = ['Month1','Month2','Month3']
conc2 = pd.concat([df9.set_index('Revised Category')[[col]].join(\
        agg[[col+'_low',col+'_up']]) for col in months],axis = 1)

    Month1  Month1_low  Month1_up   Month2  Month2_low  Month2_up   Month3  Month3_low  Month3_up
0   2       -0.414214   2.414214    0       -0.414214   2.414214    0       -0.207107   1.207107
0   0       -0.414214   2.414214    2       -0.414214   2.414214    1       -0.207107   1.207107
1   1       -0.207107   1.207107    0       -0.207107   1.207107    0       -0.207107   1.207107

我们将使用它来创建一个掩码来编码，其中 Month 列之一的值高于相应的上限。

此蒙版随后将用于将所需颜色应用于数据框的样式。

mask = conc2.apply(lambda x: pd.Series([x[col]<x[col+'_low'] for col in months]),axis = 1)

mask.columns = months # the columns names need to be the same as conc2 for apply below

    Month1  Month2  Month3
0   False   False   False
0   False   False   False
1   False   False   False

现在我们已经创建了布尔掩码，我们需要用 pandas 中的 style 对象理解的相应颜色消息替换它的值。

mask = mask.reset_index(drop = True).apply(lambda x: x.map(\
       {True:'background-color: red',False:'background-color: None'}))

现在我们的掩码中有了这些字符串，我们只需要将它应用到数据框的样式中以彩色打印它。

conc2[months].reset_index(drop = True).style.apply(lambda x: mask, axis=None)

如何从数据框中提取值以用于条件格式，同时将其一次应用于某些 select 类别或数据条目？

How to extract values from dataframe to use in conditional formatting while applying it on certain select categories or data entries at a time?

python

dataframe

pandas

xlsxwriter

pandas-groupby