Simple Split Apply Combine,自定义函数
Simple Split Apply Combine, custom function
我正在使用 pandas 中的拆分-应用-组合模式来创建一个新列,它测量两个时间戳之间的差异。
以下是我的问题的简化示例。
说,我有这个df
df = pd.DataFrame({'ssn_start_utc':pd.date_range('1/1/2011', periods=6, freq='D'), 'fld_id':[100,100,100,101,101,101], 'task_name': ['sowing','fungicide','insecticide','combine',''combine','sowing']})
df
我想按 fld_id 分组并应用一个函数,该函数创建一个列来测量每行的两个时间戳之间的差异。比如这个
def pasttime(group):
val = group['ssn_start_utc'] - group['ssn_start_utc'][0]
# why group['ssn_start_utc'][0] ?
# Because it measures time difference for each row respective to first row of each group/ particular to *sowing* entry respective to each group. I have moved all *sowing* entries to first row of df for each group
return val
df["PastTime"] =df.groupby('fld_id',group_keys=False).apply(pasttime)
结果列 df 应如下所示
df_new = pd.DataFrame({'ssn_start_utc':pd.date_range('1/1/2011', periods=6, freq='D'), 'fld_id':[100,100,100,101,101,101], 'task_name': ['sowing','fungicide','insecticide','combine',''combine','sowing'], 'pasttime' :[ 0 days, 1 days, 2 days, 3 days, -1 days, 0 days] })
df_new
我收到错误 KeyError: 0
我也尝试过使用 groupby:
df['pasttime'] = df.groupby(['fld_id'])['ssn_start_utc'].transform( df['ssn_start_utc'] - df.loc[df['name']=='sowing','ssn_start_utc'].values[0])
如何应用自定义组函数并获得所需的 df?
在您的函数中,可以按位置匹配第一个值 Series.iat
:
def pasttime(group):
val = group['ssn_start_utc'] - group['ssn_start_utc'].iat[0]
return val
df["PastTime"] =df.groupby('fld_id',group_keys=False).apply(pasttime)
Fatser 替代方法是使用 GroupBy.first
with GroupBy.transform
:
s = df.groupby('fld_id')['ssn_start_utc'].transform('first')
df['pasttime'] = df['ssn_start_utc'].sub(s)
如果每组需要 subtrat sowing
行使用与上述相同的解决方案,仅首先将不匹配的日期时间替换为 NaN
s Series.where
:
m = df['task_name']=='sowing'
s = df['ssn_start_utc'].where(m).groupby(df['fld_id']).transform('first')
df['pasttime1'] = df['ssn_start_utc'].sub(s)
print (df)
ssn_start_utc fld_id task_name PastTime pasttime pasttime1
0 2011-01-01 100 sowing 0 days 0 days 0 days
1 2011-01-02 100 fungicide 1 days 1 days 1 days
2 2011-01-03 100 insecticide 2 days 2 days 2 days
3 2011-01-04 101 combine 0 days 0 days -2 days
4 2011-01-05 101 combine 1 days 1 days -1 days
5 2011-01-06 101 sowing 2 days 2 days 0 days
我正在使用 pandas 中的拆分-应用-组合模式来创建一个新列,它测量两个时间戳之间的差异。
以下是我的问题的简化示例。
说,我有这个df
df = pd.DataFrame({'ssn_start_utc':pd.date_range('1/1/2011', periods=6, freq='D'), 'fld_id':[100,100,100,101,101,101], 'task_name': ['sowing','fungicide','insecticide','combine',''combine','sowing']})
df
我想按 fld_id 分组并应用一个函数,该函数创建一个列来测量每行的两个时间戳之间的差异。比如这个
def pasttime(group):
val = group['ssn_start_utc'] - group['ssn_start_utc'][0]
# why group['ssn_start_utc'][0] ?
# Because it measures time difference for each row respective to first row of each group/ particular to *sowing* entry respective to each group. I have moved all *sowing* entries to first row of df for each group
return val
df["PastTime"] =df.groupby('fld_id',group_keys=False).apply(pasttime)
结果列 df 应如下所示
df_new = pd.DataFrame({'ssn_start_utc':pd.date_range('1/1/2011', periods=6, freq='D'), 'fld_id':[100,100,100,101,101,101], 'task_name': ['sowing','fungicide','insecticide','combine',''combine','sowing'], 'pasttime' :[ 0 days, 1 days, 2 days, 3 days, -1 days, 0 days] })
df_new
我收到错误 KeyError: 0
我也尝试过使用 groupby:
df['pasttime'] = df.groupby(['fld_id'])['ssn_start_utc'].transform( df['ssn_start_utc'] - df.loc[df['name']=='sowing','ssn_start_utc'].values[0])
如何应用自定义组函数并获得所需的 df?
在您的函数中,可以按位置匹配第一个值 Series.iat
:
def pasttime(group):
val = group['ssn_start_utc'] - group['ssn_start_utc'].iat[0]
return val
df["PastTime"] =df.groupby('fld_id',group_keys=False).apply(pasttime)
Fatser 替代方法是使用 GroupBy.first
with GroupBy.transform
:
s = df.groupby('fld_id')['ssn_start_utc'].transform('first')
df['pasttime'] = df['ssn_start_utc'].sub(s)
如果每组需要 subtrat sowing
行使用与上述相同的解决方案,仅首先将不匹配的日期时间替换为 NaN
s Series.where
:
m = df['task_name']=='sowing'
s = df['ssn_start_utc'].where(m).groupby(df['fld_id']).transform('first')
df['pasttime1'] = df['ssn_start_utc'].sub(s)
print (df)
ssn_start_utc fld_id task_name PastTime pasttime pasttime1
0 2011-01-01 100 sowing 0 days 0 days 0 days
1 2011-01-02 100 fungicide 1 days 1 days 1 days
2 2011-01-03 100 insecticide 2 days 2 days 2 days
3 2011-01-04 101 combine 0 days 0 days -2 days
4 2011-01-05 101 combine 1 days 1 days -1 days
5 2011-01-06 101 sowing 2 days 2 days 0 days