如何使用 Python Pandas 在多条件下按平均值分组

How to groupby average in multicondition with Python Pandas

objective 是根据多索引数据帧中的多条件计算子集列平均值。

第一个条件是通过multiindex的第一级得到平均组。

第二个条件是根据下面的dict_ref值取平均值

dict_ref = dict ( occ=['F2', 'F4'], gr=['Fp1', 'Fpx'] )

例如,对于键 occ,获取 F2F4 的平均值。

下面的代码应该可以完成工作

import re

import numpy as np
import numpy.random
import pandas as pd

numpy.random.seed(0)
dict_ref = dict ( occ=['F2', 'F4'], gr=['Fp1', 'Fpx'] )
_names=['pow_fr','pow_fr','pow_fr','pow_fr','pow_fr','pow_fr','pow_fr','pow_fr',
        'hjor_com','hjor_com','hjor_com','hjor_com']

_idx=['Fp1_band0','Fp1_band1','Fpx_band0','Fpx_band1','F2_band0','F2_band1','F4_band0','F4_band1',
      'Fp1','Fpx','F2','F4']

X=np.random.rand(4,len(_names))
columns = pd.MultiIndex.from_arrays([_names, _idx])
df=pd.DataFrame(data=X, columns=columns)

remove_nan =[(e [0], *re.split ( '_', e [1] )) for e in df.columns]
remove_nan = [t + ('',) * (len ( max ( remove_nan, key=len ) ) - len ( t )) for t in remove_nan]

df.columns = pd.MultiIndex.from_tuples ( remove_nan )


df = df.T.reset_index ().rename ( columns={"level_0": "group_feature",
                                               "level_1": "ch",  "level_2": "feature","level_3": "region"} )

all_df = []
for nref in dict_ref:

    df_ch = df [df.ch.isin ( dict_ref [nref] )].groupby (["group_feature", "feature"] ).mean ().reset_index ()
    df_ch ['ch'] = nref
    all_df.append ( df_ch )

df1 = pd.concat ( [df, *all_df] ).pivot_table ( index=['group_feature', 'ch', 'feature'] ).transpose ()


df1.columns=[(gf[0], f'{gf[1]}' if not gf[-1] else f'{gf[1]}_{gf[-1]}') for gf in df1. columns. values. tolist()]

但是,我想知道是否有办法避免 for-loop

all_df = []
for nref in dict_ref:

    df_ch = df [df.ch.isin ( dict_ref [nref] )].groupby (["group_feature", "feature"] ).mean ().reset_index ()
    df_ch ['ch'] = nref
    all_df.append ( df_ch )

没有上述问题那么重要,但如果有办法完全避免以下几行,那将是一个奖励

remove_nan =[(e [0], *re.split ( '_', e [1] )) for e in df.columns]
remove_nan = [t + ('',) * (len ( max ( remove_nan, key=len ) ) - len ( t )) for t in remove_nan]

df.columns = pd.MultiIndex.from_tuples ( remove_nan )


df = df.T.reset_index ().rename ( columns={"level_0": "group_feature",
                                               "level_1": "ch",  "level_2": "feature","level_3": "region"} )

预期输出

   (hjor_com, F2)  (hjor_com, F4)  ...  (pow_fr, occ_band0)  (pow_fr, occ_band1)
0        0.791725        0.528895  ...             0.430621             0.768834
1        0.461479        0.780529  ...             0.399188             0.851316
2        0.018790        0.617635  ...             0.393202             0.594448
3        0.210383        0.128926  ...             0.528570             0.248629

[4 rows x 18 columns]

您可以翻转 dict_ref,使值数组中的每一项成为键,执行替换,并按新的 ch:

分组
mapping = {
    v: key for key, value in dict_ref.items() for v in value
}
all_df = df.replace({"ch": mapping}).groupby(["group_feature", "feature", "ch"]).mean().reset_index()

df1 = pd.concat([df, all_df])...