基于以 False 作为 Pandas 中的最新值的布尔列扩展均值

Expanding mean based on boolean column with False as most recent value in Pandas

如果我有以下数据框:

b = {'user': [1, 1, 1, 1, 2, 2, 2],
 'value': [10, 20, 30, 40, 1, 2, 3],
 'loan': [True, True, True, False, True, False, True]}
temp_df: pd.DataFrame = pd.DataFrame(b)
temp_df['date'] = np.array([23, 24, 25, 26, 27, 28, 29])
   user  value   loan  date
0     1     10   True    23
1     1     20   True    24
2     1     30   True    25
3     1     40  False    26
4     2      1   True    27
5     2      2  False    28
6     2      3   True    29

我想在一个新列中计算每个用户的“滚动”平均值,仅在 loan == True 时才考虑值,它应该是到当前行的平均值,而不是包括当前行。 因此,所需的输出应该是这样的:

   user  value   loan  date  cummean_value
0     1     10   True    23        0
1     1     20   True    24        10
2     1     30   True    25        15
3     1     40  False    26        20
4     2      1   True    27        0
5     2      2  False    28        1
6     2      3   True    29        1

loan == False 我希望该值是迄今为止计算的最后一个最近平均值(对于 loanTrue 值)。每个用户的第一个值基本上是 NaN,应该用 0 替换(因为它在所需的输出中)。

让我们试试 groupby + cumsum

temp_df['new'] = temp_df['value'].where(temp_df['loan']).groupby(temp_df['user'])\
      .apply(lambda x : (x.shift().cumsum()/x.shift().notna().cumsum()).ffill().fillna(0))
Out[54]: 
0     0.0
1    10.0
2    15.0
3    20.0
4     0.0
5     1.0
6     1.0
Name: value, dtype: float64

尝试:

# supplementary columns:
temp_df['value2'] = np.where(temp_df['loan'], temp_df['value'], 0)
temp_df['x'] = np.where(temp_df['loan'], 1, 0)

# the whole calculation assuming cummean until given row
temp_df['cummean_value'] = temp_df.groupby('user')['value2'].cumsum() \
    .div(temp_df.groupby('user')['x'].cumsum())

# assuming - until previous row (shift backward, keeping grouping
temp_df['cummean_value'] = temp_df.groupby('user')['cummean_value'].shift().fillna(0)

# clean-up
temp_df.drop(['x', 'value2'], axis=1, inplace=True)

输出:

   user  value   loan  date  cummean_value
0     1     10   True    23            0.0
1     1     20   True    24           10.0
2     1     30   True    25           15.0
3     1     40  False    26           20.0
4     2      1   True    27            0.0
5     2      2  False    28            1.0
6     2      3   True    29            1.0