比较和减去日期
Comparing and Subtracting Dates
我正在寻找一种方法来确定列中的时间是否在同一列中另一个日期的 7 天内。
说这是我的数据框-
dic = {'firstname':['Rick','Rick','Rick','John','John','John','David',
'David','David','Steve','Steve','Steve','Jim','Jim',
'Jim'],
'lastname':['Smith','Smith','Smith','Jones','Jones','Jones',
'Wilson','Wilson','Wilson','Johnson','Johnson',
'Johnson','Miller','Miller','Miller'],
'company':['CFA','CFA','CFA','WND','WND','WND','INO','INO','INO',
'CHP','CHP','CHP','MCD','MCD','MCD'],
'faveday':['2020-03-16','2020-03-11','2020-03-25','2020-04-30',
'2020-05-22','2020-05-03','2020-01-31','2020-01-13',
'2020-01-10','2020-10-22','2020-10-28','2020-10-22',
'2020-10-13','2020-10-28','2020-10-20']}
df = pd.DataFrame(dic)
df['faveday'] = pd.to_datetime(df['faveday'])
print(df)
有输出-
firstname lastname company faveday
0 Rick Smith CFA 2020-03-16
1 Rick Smith CFA 2020-03-11
2 Rick Smith CFA 2020-03-25
3 John Jones WND 2020-04-30
4 John Jones WND 2020-05-22
5 John Jones WND 2020-05-03
6 David Wilson INO 2020-01-31
7 David Wilson INO 2020-01-13
8 David Wilson INO 2020-01-10
9 Steve Johnson CHP 2020-10-22
10 Steve Johnson CHP 2020-10-28
11 Steve Johnson CHP 2020-10-22
12 Jim Miller MCD 2020-10-13
13 Jim Miller MCD 2020-10-28
14 Jim Miller MCD 2020-10-20
然后我用-
对数据进行排序
df = df.sort_values(['firstname','lastname','company','faveday'])
print(df)
获得-
firstname lastname company faveday
8 David Wilson INO 2020-01-10
7 David Wilson INO 2020-01-13
6 David Wilson INO 2020-01-31
12 Jim Miller MCD 2020-10-13
14 Jim Miller MCD 2020-10-20
13 Jim Miller MCD 2020-10-28
3 John Jones WND 2020-04-30
5 John Jones WND 2020-05-03
4 John Jones WND 2020-05-22
1 Rick Smith CFA 2020-03-11
0 Rick Smith CFA 2020-03-16
2 Rick Smith CFA 2020-03-25
9 Steve Johnson CHP 2020-10-22
11 Steve Johnson CHP 2020-10-22
10 Steve Johnson CHP 2020-10-28
假设我想知道当前顺序(索引 8,然后是 7、6、12 等)某个日期是否在另一个日期的 7 天内。 (所以索引 8 和 7 都会产生 true 但索引 6 不会)
但我也想将其按名称分组。 (因此索引 12 和 14 为真,13 在 Jim Miller 组中不为真,但索引 9、11 和 10 在 Steve Johnson 组中都为真)
有没有办法减去每个组中的日期,然后创建一个列来表示 TRUE 或 FALSE,具体取决于它在另一天的 7 天内?
我正在寻找这样的输出-
firstname lastname company faveday seven_days
8 David Wilson INO 2020-01-10 TRUE
7 David Wilson INO 2020-01-13 TRUE
6 David Wilson INO 2020-01-31 FALSE
12 Jim Miller MCD 2020-10-13 TRUE
14 Jim Miller MCD 2020-10-20 TRUE
13 Jim Miller MCD 2020-10-28 FALSE
3 John Jones WND 2020-04-30 TRUE
5 John Jones WND 2020-05-03 TRUE
4 John Jones WND 2020-05-22 FALSE
1 Rick Smith CFA 2020-03-11 TRUE
0 Rick Smith CFA 2020-03-16 TRUE
2 Rick Smith CFA 2020-03-25 FALSE
9 Steve Johnson CHP 2020-10-22 TRUE
11 Steve Johnson CHP 2020-10-22 TRUE
10 Steve Johnson CHP 2020-10-28 TRUE
让我们尝试使用 numpy
广播
自定义一个函数
def sefd (x):
return np.sum((np.abs(x.values-x.values[:,None])/np.timedelta64(1, 'D'))<=7,axis=1)>=2
s=df.groupby(['firstname', 'lastname', 'company'])['faveday'].transform(sefd)
Out[301]:
0 True
1 True
2 False
3 True
4 False
5 True
6 False
7 True
8 True
9 True
10 True
11 True
12 True
13 False
14 True
Name: faveday, dtype: bool
df['seven_days']=s
你可以试试这个。
from datetime import timedelta
m = (df.groupby(['firstname','lastname']).
apply(lambda x: x['faveday'].sub(x['faveday'].shift()).bfill()).
reset_index(level=[0,1],drop=True))
df['seven_days'] = m.le(timedelta(days=7))
firstname lastname company faveday seven_days
8 David Wilson INO 2020-01-10 True
7 David Wilson INO 2020-01-13 True
6 David Wilson INO 2020-01-31 False
12 Jim Miller MCD 2020-10-13 True
14 Jim Miller MCD 2020-10-20 True
13 Jim Miller MCD 2020-10-28 False
3 John Jones WND 2020-04-30 True
5 John Jones WND 2020-05-03 True
4 John Jones WND 2020-05-22 False
1 Rick Smith CFA 2020-03-11 True
0 Rick Smith CFA 2020-03-16 True
2 Rick Smith CFA 2020-03-25 False
9 Steve Johnson CHP 2020-10-22 True
11 Steve Johnson CHP 2020-10-22 True
10 Steve Johnson CHP 2020-10-28 True
我正在寻找一种方法来确定列中的时间是否在同一列中另一个日期的 7 天内。
说这是我的数据框-
dic = {'firstname':['Rick','Rick','Rick','John','John','John','David',
'David','David','Steve','Steve','Steve','Jim','Jim',
'Jim'],
'lastname':['Smith','Smith','Smith','Jones','Jones','Jones',
'Wilson','Wilson','Wilson','Johnson','Johnson',
'Johnson','Miller','Miller','Miller'],
'company':['CFA','CFA','CFA','WND','WND','WND','INO','INO','INO',
'CHP','CHP','CHP','MCD','MCD','MCD'],
'faveday':['2020-03-16','2020-03-11','2020-03-25','2020-04-30',
'2020-05-22','2020-05-03','2020-01-31','2020-01-13',
'2020-01-10','2020-10-22','2020-10-28','2020-10-22',
'2020-10-13','2020-10-28','2020-10-20']}
df = pd.DataFrame(dic)
df['faveday'] = pd.to_datetime(df['faveday'])
print(df)
有输出-
firstname lastname company faveday
0 Rick Smith CFA 2020-03-16
1 Rick Smith CFA 2020-03-11
2 Rick Smith CFA 2020-03-25
3 John Jones WND 2020-04-30
4 John Jones WND 2020-05-22
5 John Jones WND 2020-05-03
6 David Wilson INO 2020-01-31
7 David Wilson INO 2020-01-13
8 David Wilson INO 2020-01-10
9 Steve Johnson CHP 2020-10-22
10 Steve Johnson CHP 2020-10-28
11 Steve Johnson CHP 2020-10-22
12 Jim Miller MCD 2020-10-13
13 Jim Miller MCD 2020-10-28
14 Jim Miller MCD 2020-10-20
然后我用-
对数据进行排序df = df.sort_values(['firstname','lastname','company','faveday'])
print(df)
获得-
firstname lastname company faveday
8 David Wilson INO 2020-01-10
7 David Wilson INO 2020-01-13
6 David Wilson INO 2020-01-31
12 Jim Miller MCD 2020-10-13
14 Jim Miller MCD 2020-10-20
13 Jim Miller MCD 2020-10-28
3 John Jones WND 2020-04-30
5 John Jones WND 2020-05-03
4 John Jones WND 2020-05-22
1 Rick Smith CFA 2020-03-11
0 Rick Smith CFA 2020-03-16
2 Rick Smith CFA 2020-03-25
9 Steve Johnson CHP 2020-10-22
11 Steve Johnson CHP 2020-10-22
10 Steve Johnson CHP 2020-10-28
假设我想知道当前顺序(索引 8,然后是 7、6、12 等)某个日期是否在另一个日期的 7 天内。 (所以索引 8 和 7 都会产生 true 但索引 6 不会)
但我也想将其按名称分组。 (因此索引 12 和 14 为真,13 在 Jim Miller 组中不为真,但索引 9、11 和 10 在 Steve Johnson 组中都为真)
有没有办法减去每个组中的日期,然后创建一个列来表示 TRUE 或 FALSE,具体取决于它在另一天的 7 天内?
我正在寻找这样的输出-
firstname lastname company faveday seven_days
8 David Wilson INO 2020-01-10 TRUE
7 David Wilson INO 2020-01-13 TRUE
6 David Wilson INO 2020-01-31 FALSE
12 Jim Miller MCD 2020-10-13 TRUE
14 Jim Miller MCD 2020-10-20 TRUE
13 Jim Miller MCD 2020-10-28 FALSE
3 John Jones WND 2020-04-30 TRUE
5 John Jones WND 2020-05-03 TRUE
4 John Jones WND 2020-05-22 FALSE
1 Rick Smith CFA 2020-03-11 TRUE
0 Rick Smith CFA 2020-03-16 TRUE
2 Rick Smith CFA 2020-03-25 FALSE
9 Steve Johnson CHP 2020-10-22 TRUE
11 Steve Johnson CHP 2020-10-22 TRUE
10 Steve Johnson CHP 2020-10-28 TRUE
让我们尝试使用 numpy
广播
def sefd (x):
return np.sum((np.abs(x.values-x.values[:,None])/np.timedelta64(1, 'D'))<=7,axis=1)>=2
s=df.groupby(['firstname', 'lastname', 'company'])['faveday'].transform(sefd)
Out[301]:
0 True
1 True
2 False
3 True
4 False
5 True
6 False
7 True
8 True
9 True
10 True
11 True
12 True
13 False
14 True
Name: faveday, dtype: bool
df['seven_days']=s
你可以试试这个。
from datetime import timedelta
m = (df.groupby(['firstname','lastname']).
apply(lambda x: x['faveday'].sub(x['faveday'].shift()).bfill()).
reset_index(level=[0,1],drop=True))
df['seven_days'] = m.le(timedelta(days=7))
firstname lastname company faveday seven_days
8 David Wilson INO 2020-01-10 True
7 David Wilson INO 2020-01-13 True
6 David Wilson INO 2020-01-31 False
12 Jim Miller MCD 2020-10-13 True
14 Jim Miller MCD 2020-10-20 True
13 Jim Miller MCD 2020-10-28 False
3 John Jones WND 2020-04-30 True
5 John Jones WND 2020-05-03 True
4 John Jones WND 2020-05-22 False
1 Rick Smith CFA 2020-03-11 True
0 Rick Smith CFA 2020-03-16 True
2 Rick Smith CFA 2020-03-25 False
9 Steve Johnson CHP 2020-10-22 True
11 Steve Johnson CHP 2020-10-22 True
10 Steve Johnson CHP 2020-10-28 True