查找自组内成立以来的日期差异
Find date differences since inception within group
是否有一种有效的方法来计算数据框的组内日期差异(以天为单位):
x = pd.DataFrame(
{'grp':['A','A','A','B','B','B'],
'dt':pd.DatetimeIndex(['1/1/00 00:00:00','1/2/00','1/3/00','1/2/01','1/3/01','1/5/01'])})
x
Out[1]:
dt grp
0 2000-01-01 00:00:00 A
1 2000-01-02 00:00:00 A
2 2000-01-03 00:00:00 A
3 2001-01-02 00:00:00 B
4 2001-01-03 00:00:00 B
5 2001-01-05 00:00:00 B
因此结果类似于:
grp days_since_start
A 0
A 1
A 2
B 0
B 1
B 3
当然可以。按组名分组,取每组中时间最小的,取差值:
x.set_index('grp') - x.groupby('grp').min()
# dt
#grp
#A 0 days
#A 1 days
#A 2 days
#B 0 days
#B 1 days
#B 3 days
#Name: dt, dtype: timedelta64[ns]
是否有一种有效的方法来计算数据框的组内日期差异(以天为单位):
x = pd.DataFrame(
{'grp':['A','A','A','B','B','B'],
'dt':pd.DatetimeIndex(['1/1/00 00:00:00','1/2/00','1/3/00','1/2/01','1/3/01','1/5/01'])})
x
Out[1]:
dt grp
0 2000-01-01 00:00:00 A
1 2000-01-02 00:00:00 A
2 2000-01-03 00:00:00 A
3 2001-01-02 00:00:00 B
4 2001-01-03 00:00:00 B
5 2001-01-05 00:00:00 B
因此结果类似于:
grp days_since_start
A 0
A 1
A 2
B 0
B 1
B 3
当然可以。按组名分组,取每组中时间最小的,取差值:
x.set_index('grp') - x.groupby('grp').min()
# dt
#grp
#A 0 days
#A 1 days
#A 2 days
#B 0 days
#B 1 days
#B 3 days
#Name: dt, dtype: timedelta64[ns]