pd 条件总和和百分比
pd conditional cumsum and percentage
我有以下形状的数据框:
df = pd.DataFrame()
df["trial"] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
df["correct"] = [1, 0, 1, 1, 0, 0, 0, 1, 0, 1]
df["responding_subject"] = ["one", "two", "two", "two", "one", "two", "one", "one", "one", "two"]
我想添加 2 个新列来表示到目前为止自己试验的准确性(即比例正确)。
例如索引 4 处的 df["acc_one"] 将计算所有先前(试验 0-3)自己(但不是合作伙伴!)试验。
IIUC,您可以使用 groupby
:
计算扩展精度
g = df.groupby('responding_subject')['correct']
df['accuracy'] = g.cumsum()/(g.cumcount()+1)
输出:
trial correct responding_subject accuracy
0 0 1 one 1.000000
1 1 0 two 0.000000
2 2 1 two 0.500000
3 3 1 two 0.666667
4 4 0 one 0.500000
5 5 0 two 0.500000
6 6 0 one 0.333333
7 7 1 one 0.500000
8 8 0 one 0.400000
9 9 1 two 0.600000
没有必要在不同的列中拆分精度,但是如果您确实需要,请添加 pivot
步骤:
df.join(df.pivot(columns='responding_subject', values='accuracy').add_prefix('acc_'))
输出:
trial correct responding_subject accuracy acc_one acc_two
0 0 1 one 1.000000 1.000000 NaN
1 1 0 two 0.000000 NaN 0.000000
2 2 1 two 0.500000 NaN 0.500000
3 3 1 two 0.666667 NaN 0.666667
4 4 0 one 0.500000 0.500000 NaN
5 5 0 two 0.500000 NaN 0.500000
6 6 0 one 0.333333 0.333333 NaN
7 7 1 one 0.500000 0.500000 NaN
8 8 0 one 0.400000 0.400000 NaN
9 9 1 two 0.600000 NaN 0.600000
我有以下形状的数据框:
df = pd.DataFrame()
df["trial"] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
df["correct"] = [1, 0, 1, 1, 0, 0, 0, 1, 0, 1]
df["responding_subject"] = ["one", "two", "two", "two", "one", "two", "one", "one", "one", "two"]
我想添加 2 个新列来表示到目前为止自己试验的准确性(即比例正确)。
例如索引 4 处的 df["acc_one"] 将计算所有先前(试验 0-3)自己(但不是合作伙伴!)试验。
IIUC,您可以使用 groupby
:
g = df.groupby('responding_subject')['correct']
df['accuracy'] = g.cumsum()/(g.cumcount()+1)
输出:
trial correct responding_subject accuracy
0 0 1 one 1.000000
1 1 0 two 0.000000
2 2 1 two 0.500000
3 3 1 two 0.666667
4 4 0 one 0.500000
5 5 0 two 0.500000
6 6 0 one 0.333333
7 7 1 one 0.500000
8 8 0 one 0.400000
9 9 1 two 0.600000
没有必要在不同的列中拆分精度,但是如果您确实需要,请添加 pivot
步骤:
df.join(df.pivot(columns='responding_subject', values='accuracy').add_prefix('acc_'))
输出:
trial correct responding_subject accuracy acc_one acc_two
0 0 1 one 1.000000 1.000000 NaN
1 1 0 two 0.000000 NaN 0.000000
2 2 1 two 0.500000 NaN 0.500000
3 3 1 two 0.666667 NaN 0.666667
4 4 0 one 0.500000 0.500000 NaN
5 5 0 two 0.500000 NaN 0.500000
6 6 0 one 0.333333 0.333333 NaN
7 7 1 one 0.500000 0.500000 NaN
8 8 0 one 0.400000 0.400000 NaN
9 9 1 two 0.600000 NaN 0.600000