基于没有 itertuples 的另一个数据框列的累积和？

Question

如果没有 itertuples，我无法解决这个问题

我想从每一行中取平均值，累计加起来小于不同列总和的 1/3

起始数据帧：

df = pd.DataFrame({'model_1': [0.15, 0.19, 0.25, 0.54, 0.55 , 0.98, 1.12],
                   'model_2': [0.12, 0.13, 0.32, 0.45, 0.6 , 0.7, 1.05],
                   'exposure': [0.4, 1, 1.6, 1, 2, 2, 3],
                   'target': [0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 1.1]})

这里看到曝光的总和是11，我的意图是分3个桶，对所有累计和小于或等于1/3的行取平均值总曝光率

所以我们可以看到前 4 行的累加和为 4，那么我想对这些列取一个相对平均值。

这意味着 aggr_model_1 中的第一个值是：

((0.15 * 0.4) + (0.19 * 1) + (0.25 * 1.6) + 0.54)/4 = 0.2975

然后对 aggr_model_2 和 aggr_target

应用相同的过程

输出数据帧：

output_df = pd.DataFrame({'aggr_model_1': [0.2975, 0.765, 1.12],
                          'aggr_model_2': [0.285, 0.65, 1.05],
                          'aggr_exposure': [4, 4, 3],
                          'aggr_target': [0.28, 0.65, 1.1]})

Answer 1

我试试看，看我理解对不对。此计算的成分是：

总曝光量，我们可以计算为 total = df.exposure.sum()
bins 将总曝光分成 3 部分 bins = np.linspace(0, total, 4)
累计曝光，即cum_exposure = df.exposure.cumsum()
分级累积曝光bin_cum_exposure = pd.cut(cum_exposure, bins)
曝光加权观测值w_model_1 = df.exposure * df.model_1
意思是！ df.groupby('bin_cum_exposure').w_model_1.mean()

把事情放在一起：

total = df.exposure.sum()
bins = np.linspace(0, total, 4)

(df.assign(bin_cum_exposure = lambda x: pd.cut(x.exposure.cumsum(), bins),
           w_model_1 = lambda x: x.exposure * x.model_1,
           w_model_2 = lambda x: x.exposure * x.model_2,
           w_total = lambda x: x.exposure * x.target)
   .groupby('bin_cum_exposure')
   .mean()
)

答案与您的手动计算不同，因为第一个 bin 有 3 个元素，而不是您示例中的 4 个元素。

基于没有 itertuples 的另一个数据框列的累积和？

Cumulative sum based on another dataframe column without itertuples?

python

dataframe

cumulative-sum

pandas