比较和减去日期

Comparing and Subtracting Dates

我正在寻找一种方法来确定列中的时间是否在同一列中另一个日期的 7 天内。

说这是我的数据框-

dic = {'firstname':['Rick','Rick','Rick','John','John','John','David',
                    'David','David','Steve','Steve','Steve','Jim','Jim',
                    'Jim'],
       'lastname':['Smith','Smith','Smith','Jones','Jones','Jones',
                   'Wilson','Wilson','Wilson','Johnson','Johnson',
                   'Johnson','Miller','Miller','Miller'],
       'company':['CFA','CFA','CFA','WND','WND','WND','INO','INO','INO',
                  'CHP','CHP','CHP','MCD','MCD','MCD'],
       'faveday':['2020-03-16','2020-03-11','2020-03-25','2020-04-30',
                  '2020-05-22','2020-05-03','2020-01-31','2020-01-13',
                  '2020-01-10','2020-10-22','2020-10-28','2020-10-22',
                  '2020-10-13','2020-10-28','2020-10-20']}
df = pd.DataFrame(dic)
df['faveday'] = pd.to_datetime(df['faveday'])
print(df)

有输出-

   firstname lastname company    faveday
0       Rick    Smith     CFA 2020-03-16
1       Rick    Smith     CFA 2020-03-11
2       Rick    Smith     CFA 2020-03-25
3       John    Jones     WND 2020-04-30
4       John    Jones     WND 2020-05-22
5       John    Jones     WND 2020-05-03
6      David   Wilson     INO 2020-01-31
7      David   Wilson     INO 2020-01-13
8      David   Wilson     INO 2020-01-10
9      Steve  Johnson     CHP 2020-10-22
10     Steve  Johnson     CHP 2020-10-28
11     Steve  Johnson     CHP 2020-10-22
12       Jim   Miller     MCD 2020-10-13
13       Jim   Miller     MCD 2020-10-28
14       Jim   Miller     MCD 2020-10-20

然后我用-

对数据进行排序
df = df.sort_values(['firstname','lastname','company','faveday'])
print(df)

获得-

   firstname lastname company    faveday
8      David   Wilson     INO 2020-01-10
7      David   Wilson     INO 2020-01-13
6      David   Wilson     INO 2020-01-31
12       Jim   Miller     MCD 2020-10-13
14       Jim   Miller     MCD 2020-10-20
13       Jim   Miller     MCD 2020-10-28
3       John    Jones     WND 2020-04-30
5       John    Jones     WND 2020-05-03
4       John    Jones     WND 2020-05-22
1       Rick    Smith     CFA 2020-03-11
0       Rick    Smith     CFA 2020-03-16
2       Rick    Smith     CFA 2020-03-25
9      Steve  Johnson     CHP 2020-10-22
11     Steve  Johnson     CHP 2020-10-22
10     Steve  Johnson     CHP 2020-10-28

假设我想知道当前顺序(索引 8,然后是 7、6、12 等)某个日期是否在另一个日期的 7 天内。 (所以索引 8 和 7 都会产生 true 但索引 6 不会)

但我也想将其按名称分组。 (因此索引 12 和 14 为真,13 在 Jim Miller 组中不为真,但索引 9、11 和 10 在 Steve Johnson 组中都为真)

有没有办法减去每个组中的日期,然后创建一个列来表示 TRUE 或 FALSE,具体取决于它在另一天的 7 天内?

我正在寻找这样的输出-

   firstname lastname company    faveday seven_days
8      David   Wilson     INO 2020-01-10       TRUE
7      David   Wilson     INO 2020-01-13       TRUE
6      David   Wilson     INO 2020-01-31      FALSE
12       Jim   Miller     MCD 2020-10-13       TRUE
14       Jim   Miller     MCD 2020-10-20       TRUE
13       Jim   Miller     MCD 2020-10-28      FALSE
3       John    Jones     WND 2020-04-30       TRUE
5       John    Jones     WND 2020-05-03       TRUE
4       John    Jones     WND 2020-05-22      FALSE
1       Rick    Smith     CFA 2020-03-11       TRUE
0       Rick    Smith     CFA 2020-03-16       TRUE
2       Rick    Smith     CFA 2020-03-25      FALSE
9      Steve  Johnson     CHP 2020-10-22       TRUE
11     Steve  Johnson     CHP 2020-10-22       TRUE
10     Steve  Johnson     CHP 2020-10-28       TRUE

让我们尝试使用 numpy 广播

自定义一个函数
def sefd (x): 
    return np.sum((np.abs(x.values-x.values[:,None])/np.timedelta64(1, 'D'))<=7,axis=1)>=2
s=df.groupby(['firstname', 'lastname', 'company'])['faveday'].transform(sefd)
Out[301]: 
0      True
1      True
2     False
3      True
4     False
5      True
6     False
7      True
8      True
9      True
10     True
11     True
12     True
13    False
14     True
Name: faveday, dtype: bool
df['seven_days']=s

你可以试试这个。

from datetime import timedelta

m = (df.groupby(['firstname','lastname']).
        apply(lambda x: x['faveday'].sub(x['faveday'].shift()).bfill()).
        reset_index(level=[0,1],drop=True))
df['seven_days'] = m.le(timedelta(days=7))

   firstname lastname company    faveday  seven_days
8      David   Wilson     INO 2020-01-10        True
7      David   Wilson     INO 2020-01-13        True
6      David   Wilson     INO 2020-01-31       False
12       Jim   Miller     MCD 2020-10-13        True
14       Jim   Miller     MCD 2020-10-20        True
13       Jim   Miller     MCD 2020-10-28       False
3       John    Jones     WND 2020-04-30        True
5       John    Jones     WND 2020-05-03        True
4       John    Jones     WND 2020-05-22       False
1       Rick    Smith     CFA 2020-03-11        True
0       Rick    Smith     CFA 2020-03-16        True
2       Rick    Smith     CFA 2020-03-25       False
9      Steve  Johnson     CHP 2020-10-22        True
11     Steve  Johnson     CHP 2020-10-22        True
10     Steve  Johnson     CHP 2020-10-28        True