跨两个维度计算 python 中的 cumsum

Question

我有一个如下所示的数据集：

Subject Session Trial   Choice
1          1       1    A
1          1       2    B
1          1       3    B
1          2       1    B
1          2       2    B
2          1       1    A

我想生成另外两列——一列 returns 基于 "choice" 的值，另一列跟踪每个主题每个会话的这些选择的累积总和。我希望输出看起来像这样：

Subject Session Trial   Choice Score    Cum Score
1          1       1    A        1       1
1          1       2    B       -1       0
1          1       3    B       -1      -1
1          2       1    B       -1      -1
1          2       2    B       -1      -2
2          1       1    A        1       1

根据类似问题的答案，我尝试了以下方法：

def change_score(c):
 if c['Chosen'] == A:
   return 1.0
 elif c['Chosen'] == B:
   return -1.0
 else:
   return ''
df1['change_score'] = df1.apply(change_score, axis=1)


df1['Session']=df1['Subject'].apply(lambda x: x[:7])
df1['cumulative_score']=df1.groupby(['Session'])['change_score'].cumsum()

这会导致以下错误：TypeError: 'int' object is not subscriptable

我（显然）是 python 的新手，非常感谢任何帮助。

Answer 1

分两步完成。首先是创建 Score 列。使用 np.where:

df['Score'] = np.where(df.Choice == 'A', 1, -1)
df

   Subject  Session  Trial Choice  Score
0        1        1      1      A      1
1        1        1      2      B     -1
2        1        1      3      B     -1
3        1        2      1      B     -1
4        1        2      2      B     -1
5        2        1      1      A      1

或者，要获得更多选项，请使用嵌套 where:

df['Score'] = np.where(df.Choice == 'A', 1, 
                   np.where(df.Choice == 'B', -1, np.nan)

请注意，如果您想要提高性能，则不应在单个列中混合使用字符串和数字类型（不要使用 ''）。

或者，使用 np.select:

df['Score'] = np.select([df.Choice == 'A', df.Choice == 'B'], [1, -1])

现在，使用 groupby:

生成 CumScore 列

df['CumScore'] = df.groupby('Session').Score.cumsum()
df

   Subject  Session  Trial Choice  Score  CumScore
0        1        1      1      A      1         1
1        1        1      2      B     -1         0
2        1        1      3      B     -1        -1
3        1        2      1      B     -1        -1
4        1        2      2      B     -1        -2
5        2        1      1      A      1         0

跨两个维度计算 python 中的 cumsum

Calculating cumsum in python across two dimensions

python

group-by

dataframe

pandas

cumsum