pd 条件总和和百分比

pd conditional cumsum and percentage

我有以下形状的数据框:

df = pd.DataFrame()
df["trial"] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
df["correct"] = [1, 0, 1, 1, 0, 0, 0, 1, 0, 1]
df["responding_subject"] = ["one", "two", "two", "two", "one", "two", "one", "one", "one", "two"]

我想添加 2 个新列来表示到目前为止自己试验的准确性(即比例正确)。

例如索引 4 处的 df["acc_one"] 将计算所有先前(试验 0-3)自己(但不是合作伙伴!)试验

IIUC,您可以使用 groupby:

计算扩展精度
g = df.groupby('responding_subject')['correct']
df['accuracy'] = g.cumsum()/(g.cumcount()+1)

输出:

   trial  correct responding_subject  accuracy
0      0        1                one  1.000000
1      1        0                two  0.000000
2      2        1                two  0.500000
3      3        1                two  0.666667
4      4        0                one  0.500000
5      5        0                two  0.500000
6      6        0                one  0.333333
7      7        1                one  0.500000
8      8        0                one  0.400000
9      9        1                two  0.600000

没有必要在不同的列中拆分精度,但是如果您确实需要,请添加 pivot 步骤:

df.join(df.pivot(columns='responding_subject', values='accuracy').add_prefix('acc_'))

输出:

   trial  correct responding_subject  accuracy   acc_one   acc_two
0      0        1                one  1.000000  1.000000       NaN
1      1        0                two  0.000000       NaN  0.000000
2      2        1                two  0.500000       NaN  0.500000
3      3        1                two  0.666667       NaN  0.666667
4      4        0                one  0.500000  0.500000       NaN
5      5        0                two  0.500000       NaN  0.500000
6      6        0                one  0.333333  0.333333       NaN
7      7        1                one  0.500000  0.500000       NaN
8      8        0                one  0.400000  0.400000       NaN
9      9        1                two  0.600000       NaN  0.600000