创建一个列变量，取以其他两个变量为条件的变量的平均值

Question

我有一个数据框显示每个给定条件的平均值 'dwdime'：

DIMExCand_means = DIMExCand.groupby(['cycle', 'coded_state', 'party.orig', 'comtype']).mean()

我使用以下命令和输出从 DIMExCand_means 创建了一个枢轴 table：

DIMExCand_master = pd.pivot_table(DIMExCand_means,index=["Cycle","State"])

但是，一些数据会在此过程中丢失。我想向 'DIMExCand_master' 数据框添加列，其中包含给定 'party.orig' 和 'comptype' 的每种可能组合的平均 'dwdime' 分数，因为这将允许我有一个条目每 'cycle'-'coded_state'.

Answer 1

让我们试试：

DIMExCand_means = DIMExCand_means.reset_index()
DIMExCand_master = DIMExCand_master.reset_index()

pd.merge(DIMExCand_means, DIMExCand_master, left_on=['cycle','coded_state'], right_on=['Cycle','State'])

Answer 2

谢谢！

我最终选择了：

DIMExCand_dime = pd.pivot_table(DIMExCand，值 = 'dwdime'，索引 ["Cycle"，"State"]，列='ID'， aggfunc=np.mean)

创建一个列变量，取以其他两个变量为条件的变量的平均值

Creating a column variable taking the mean of a variable conditional on two other variables

python

pivot

numpy

pandas