Python:根据Key和附加条件将聚合列添加到DataFrame
Python: Add aggregated columns to DataFrame based on the Key and additional conditions
我在以下视图中有 2 个数据框:
dogs
数据框是:
DogID PuppyName1 PuppyName2 PuppyName3 PuppyName4 DogWeight
Dog1 Nick NaN NaN NaN 12.7
Dog2 Jack Fox Rex NaN 15.5
Dog3 Snack NaN NaN NaN 10.2
Dog4 Yosee Petty NaN NaN 16.9
puppyWeights
数据框是:
PuppyName Jan17 Jun18 Dec18 April19
Nick 0.8 1.7 3.7 4.6
Jack 0.6 1.3 2.8 3.5
Fox 0.9 1.7 3.4 4.3
Rex 1.0 2.3 3.0 4.2
Snack 0.8 1.7 2.8 4.4
Yosee 0.6 1.2 3.1 4.3
Petty 0.5 1.3 2.8 3.5
我需要根据 PuppyWeights
数据框向 Dogs
数据框添加按月计算的小狗体重信息。如果 Dog 有超过 1 个 child,例如:Dog2
,Dog3
-> 我需要对每个月的权重值取 PuppyName
的平均值。例如:
Dog2
应该是 Jack
和 PuppyWeights
中 Fox
的平均值 table:
DogID Jan17 Jun18 Dec18 April19 DogWeight
Dog2 0.75 1.5 3.1 3.9 15.5
我尝试使用 melt
函数将 ['PuppyName1', 'PuppyName2', 'PuppyName3', 'PuppyName4']
列转换为行。
但是,当狗有多个 child.
时,我不知道如何通过聚合将月份信息添加到 dogs
数据框
df2 = dogs.melt(id_vars=['DogID','DogWeight'], var_name="Puppies", value_name='PuppyName')
期望的输出是:
DogID Jan17 Jun18 Dec18 April19 DogWeight
Dog1 0.8 1.7 3.7 4.6 12.7
Dog2 0.75 1.5 3.1 3.9 15.5
Dog3 0.8 1.7 2.8 4.4 10.2
Dog4 0.55 1.25 2.95 3.9 16.9
如何按月向 dogs
数据框添加权重信息?
如有任何想法,我将不胜感激。谢谢)
这是一种方式 melt
dogs
,然后是 merge
和 groupby
df2 = dogs.melt(id_vars=['DogID','DogWeight'], var_name="Puppies", value_name='PuppyName').dropna()
df2.merge(df,on='PuppyName',how='left').groupby('DogID').mean()
Out[423]:
DogWeight Jan17 Jun18 Dec18 April19
DogID
Dog1 12.7 0.800000 1.700000 3.700000 4.6
Dog2 15.5 0.833333 1.766667 3.066667 4.0
Dog3 10.2 0.800000 1.700000 2.800000 4.4
Dog4 16.9 0.550000 1.250000 2.950000 3.9
我在以下视图中有 2 个数据框:
dogs
数据框是:
DogID PuppyName1 PuppyName2 PuppyName3 PuppyName4 DogWeight
Dog1 Nick NaN NaN NaN 12.7
Dog2 Jack Fox Rex NaN 15.5
Dog3 Snack NaN NaN NaN 10.2
Dog4 Yosee Petty NaN NaN 16.9
puppyWeights
数据框是:
PuppyName Jan17 Jun18 Dec18 April19
Nick 0.8 1.7 3.7 4.6
Jack 0.6 1.3 2.8 3.5
Fox 0.9 1.7 3.4 4.3
Rex 1.0 2.3 3.0 4.2
Snack 0.8 1.7 2.8 4.4
Yosee 0.6 1.2 3.1 4.3
Petty 0.5 1.3 2.8 3.5
我需要根据 PuppyWeights
数据框向 Dogs
数据框添加按月计算的小狗体重信息。如果 Dog 有超过 1 个 child,例如:Dog2
,Dog3
-> 我需要对每个月的权重值取 PuppyName
的平均值。例如:
Dog2
应该是 Jack
和 PuppyWeights
中 Fox
的平均值 table:
DogID Jan17 Jun18 Dec18 April19 DogWeight
Dog2 0.75 1.5 3.1 3.9 15.5
我尝试使用 melt
函数将 ['PuppyName1', 'PuppyName2', 'PuppyName3', 'PuppyName4']
列转换为行。
但是,当狗有多个 child.
时,我不知道如何通过聚合将月份信息添加到dogs
数据框
df2 = dogs.melt(id_vars=['DogID','DogWeight'], var_name="Puppies", value_name='PuppyName')
期望的输出是:
DogID Jan17 Jun18 Dec18 April19 DogWeight
Dog1 0.8 1.7 3.7 4.6 12.7
Dog2 0.75 1.5 3.1 3.9 15.5
Dog3 0.8 1.7 2.8 4.4 10.2
Dog4 0.55 1.25 2.95 3.9 16.9
如何按月向 dogs
数据框添加权重信息?
如有任何想法,我将不胜感激。谢谢)
这是一种方式 melt
dogs
,然后是 merge
和 groupby
df2 = dogs.melt(id_vars=['DogID','DogWeight'], var_name="Puppies", value_name='PuppyName').dropna()
df2.merge(df,on='PuppyName',how='left').groupby('DogID').mean()
Out[423]:
DogWeight Jan17 Jun18 Dec18 April19
DogID
Dog1 12.7 0.800000 1.700000 3.700000 4.6
Dog2 15.5 0.833333 1.766667 3.066667 4.0
Dog3 10.2 0.800000 1.700000 2.800000 4.4
Dog4 16.9 0.550000 1.250000 2.950000 3.9