Groupby 2个分类变量
Groupby 2 categorical variables
我有一个如下所示的数据框:
ID
memory confidence
Test (1= correct, 2=incorrect)
Experiment
1
56
1
Experiment 1
1
78
0
Experiment 1
1
98
0
Experiment 1
1
24
1
Experiment 2
2
45
0
Experiment 2
2
87
1
Experiment 2
我想看看一个人的平均信心是否与他们在测试中的表现相关。所以我写了下面的代码,它显示了一个人的平均记忆信心和他们的平均分数:
df3 = df.groupby(['PID'])['accuracy','memory_confidence'].mean()
i = sns.lmplot(x = 'memory_confidence', y = 'accuracy', 数据 = df3)
我现在要做的是为实验 1 和实验 2 计算不同的相关性/lmplots
添加 'source' 不起作用,因为我得到 KeyError: "['source'] not in index"
df3 = df.groupby(['PID','source'])['accuracy','memory_confidence'].mean()
i = sns.lmplot(x = 'memory_confidence', y = 'accuracy', hue='source',数据=df3)
import numpy as np
import pandas as pd
df = pd.DataFrame([
[1, 56, 1, 'Experiment 1'],
[1, 78, 0, 'Experiment 1'],
[1, 98, 0, 'Experiment 1'],
[1, 24, 1, 'Experiment 2'],
[2, 45, 0, 'Experiment 2'],
[2, 87, 1, 'Experiment 2']
], columns=['ID', 'memory_confidence', 'accuracy', 'Experiment'])
sns.lmplot(x = 'memory_confidence', y = 'accuracy', hue='Experiment', data=df)
plt.show()
exp1 = df[df['Experiment'] == 'Experiment 1']
exp1_corr = exp1.corr().loc['memory_confidence', 'accuracy']
exp2 = df[df['Experiment'] == 'Experiment 2']
exp2_corr = exp2.corr().loc['memory_confidence', 'accuracy']
print(exp1_corr, exp2_corr)
生成以下内容:
-0.8794395358869003 0.18898223650461368
我有一个如下所示的数据框:
ID | memory confidence | Test (1= correct, 2=incorrect) | Experiment |
---|---|---|---|
1 | 56 | 1 | Experiment 1 |
1 | 78 | 0 | Experiment 1 |
1 | 98 | 0 | Experiment 1 |
1 | 24 | 1 | Experiment 2 |
2 | 45 | 0 | Experiment 2 |
2 | 87 | 1 | Experiment 2 |
我想看看一个人的平均信心是否与他们在测试中的表现相关。所以我写了下面的代码,它显示了一个人的平均记忆信心和他们的平均分数:
df3 = df.groupby(['PID'])['accuracy','memory_confidence'].mean()
i = sns.lmplot(x = 'memory_confidence', y = 'accuracy', 数据 = df3)
我现在要做的是为实验 1 和实验 2 计算不同的相关性/lmplots
添加 'source' 不起作用,因为我得到 KeyError: "['source'] not in index"
df3 = df.groupby(['PID','source'])['accuracy','memory_confidence'].mean()
i = sns.lmplot(x = 'memory_confidence', y = 'accuracy', hue='source',数据=df3)
import numpy as np
import pandas as pd
df = pd.DataFrame([
[1, 56, 1, 'Experiment 1'],
[1, 78, 0, 'Experiment 1'],
[1, 98, 0, 'Experiment 1'],
[1, 24, 1, 'Experiment 2'],
[2, 45, 0, 'Experiment 2'],
[2, 87, 1, 'Experiment 2']
], columns=['ID', 'memory_confidence', 'accuracy', 'Experiment'])
sns.lmplot(x = 'memory_confidence', y = 'accuracy', hue='Experiment', data=df)
plt.show()
exp1 = df[df['Experiment'] == 'Experiment 1']
exp1_corr = exp1.corr().loc['memory_confidence', 'accuracy']
exp2 = df[df['Experiment'] == 'Experiment 2']
exp2_corr = exp2.corr().loc['memory_confidence', 'accuracy']
print(exp1_corr, exp2_corr)
生成以下内容:
-0.8794395358869003 0.18898223650461368