如何在不合并的情况下按列将组附加到主数据框?
How to attach a group by column to a main dataframe without merge?
我有一个如下所示的数据框
df = pd.DataFrame(
{'stud_id' : [101, 101, 101, 101,
101, 101, 101, 101],
'sub_code' : ['CSE01', 'CSE01', 'CSE01',
'CSE01', 'CSE02', 'CSE02',
'CSE02', 'CSE02'],
'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016',
'13/05/2010', '10/11/2008','11/1/2007', '27/02/2006'],
'resp_date' : [np.nan, '11/1/2018','14/11/2017', '29/03/2016',
np.nan, np.nan,np.nan,'28/02/2006'],
'marks' : [77, 86, 55, 90,
65, 90, 80, 67]}
)
df['ques_date'] = pd.to_datetime(df['ques_date'], dayfirst=True)
df.sort_values(['stud_id','sub_code','ques_date'],inplace=True)
我想为每个 stud_id
和 sub_cod
计算 ques_date
之间的 mean difference
并将其存储在新列中。
所以,我尝试了以下方法
df['next_ques_date'] = df.groupby(['stud_id','sub_code'])['ques_date'].shift(-1)
df['backlog_wish_req_diff'] = (pd.to_datetime(df['next_ques_date'], dayfirst=True) - pd.to_datetime(df['ques_date'], dayfirst=True)).dt.days
tdf = df.groupby(['stud_id','sub_code'],as_index=False)['backlog_wish_req_diff'].mean().rename(columns={'backlog_wish_req_diff':'backlog_wish_req_mean_days'})
(df.merge(tdf, left_on=['stud_id','sub_code'], right_on=['stud_id','sub_code'],
suffixes=('', '_y')))
虽然输出是正确的,但我想在 groupby 之后直接将 backlog_wish_req_mean_days
列附加到 df
。我不喜欢用 df
合并到 link(来自 tdf
)
有什么高效优雅的方法可以做到这一点吗?没有合并?
我希望我的输出如下所示
使用:
df['backlog_wish_req_mean_days']=(df.groupby(['stud_id','sub_code'])['backlog_wish_req_diff']
.transform('mean'))
print (df)
stud_id sub_code ques_date resp_date marks next_ques_date \
3 101 CSE01 2016-03-27 29/03/2016 90 2017-11-11
2 101 CSE01 2017-11-11 14/11/2017 55 2018-01-10
1 101 CSE01 2018-01-10 11/1/2018 86 2020-11-13
0 101 CSE01 2020-11-13 NaN 77 NaT
7 101 CSE02 2006-02-27 28/02/2006 67 2007-01-11
6 101 CSE02 2007-01-11 NaN 80 2008-11-10
5 101 CSE02 2008-11-10 NaN 90 2010-05-13
4 101 CSE02 2010-05-13 NaN 65 NaT
backlog_wish_req_diff backlog_wish_req_mean_days
3 594.0 564.0
2 60.0 564.0
1 1038.0 564.0
0 NaN 564.0
7 318.0 512.0
6 669.0 512.0
5 549.0 512.0
4 NaN 512.0
我有一个如下所示的数据框
df = pd.DataFrame(
{'stud_id' : [101, 101, 101, 101,
101, 101, 101, 101],
'sub_code' : ['CSE01', 'CSE01', 'CSE01',
'CSE01', 'CSE02', 'CSE02',
'CSE02', 'CSE02'],
'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016',
'13/05/2010', '10/11/2008','11/1/2007', '27/02/2006'],
'resp_date' : [np.nan, '11/1/2018','14/11/2017', '29/03/2016',
np.nan, np.nan,np.nan,'28/02/2006'],
'marks' : [77, 86, 55, 90,
65, 90, 80, 67]}
)
df['ques_date'] = pd.to_datetime(df['ques_date'], dayfirst=True)
df.sort_values(['stud_id','sub_code','ques_date'],inplace=True)
我想为每个 stud_id
和 sub_cod
计算 ques_date
之间的 mean difference
并将其存储在新列中。
所以,我尝试了以下方法
df['next_ques_date'] = df.groupby(['stud_id','sub_code'])['ques_date'].shift(-1)
df['backlog_wish_req_diff'] = (pd.to_datetime(df['next_ques_date'], dayfirst=True) - pd.to_datetime(df['ques_date'], dayfirst=True)).dt.days
tdf = df.groupby(['stud_id','sub_code'],as_index=False)['backlog_wish_req_diff'].mean().rename(columns={'backlog_wish_req_diff':'backlog_wish_req_mean_days'})
(df.merge(tdf, left_on=['stud_id','sub_code'], right_on=['stud_id','sub_code'],
suffixes=('', '_y')))
虽然输出是正确的,但我想在 groupby 之后直接将 backlog_wish_req_mean_days
列附加到 df
。我不喜欢用 df
合并到 link(来自 tdf
)
有什么高效优雅的方法可以做到这一点吗?没有合并?
我希望我的输出如下所示
使用:
df['backlog_wish_req_mean_days']=(df.groupby(['stud_id','sub_code'])['backlog_wish_req_diff']
.transform('mean'))
print (df)
stud_id sub_code ques_date resp_date marks next_ques_date \
3 101 CSE01 2016-03-27 29/03/2016 90 2017-11-11
2 101 CSE01 2017-11-11 14/11/2017 55 2018-01-10
1 101 CSE01 2018-01-10 11/1/2018 86 2020-11-13
0 101 CSE01 2020-11-13 NaN 77 NaT
7 101 CSE02 2006-02-27 28/02/2006 67 2007-01-11
6 101 CSE02 2007-01-11 NaN 80 2008-11-10
5 101 CSE02 2008-11-10 NaN 90 2010-05-13
4 101 CSE02 2010-05-13 NaN 65 NaT
backlog_wish_req_diff backlog_wish_req_mean_days
3 594.0 564.0
2 60.0 564.0
1 1038.0 564.0
0 NaN 564.0
7 318.0 512.0
6 669.0 512.0
5 549.0 512.0
4 NaN 512.0