如何使用Python获取时间限制内的行?
How to get the rows within a time limit using Python?
我从 Excel 读取了销售交易 table,我想知道第一件商品售出后 1 小时内的销售数量。设 A
为销售报表,我要创建 B
.
A=
item Location time
X Canada 10:03:18
X Canada 10:08:38
X Canada 10:24:46
X Canada 11:16:35
X US 10:00:16
X US 11:52:12
Y Canada 2:08:38
Y Canada 4:01:48
Y US 13:32:02
Y US 14:07:03
B=
item location first sale count
X Canada 10:03:18 3
X US 10:00:16 1
Y Canada 2:08:38 1
Y US 13:32:02 2
这是我所做的:
A= A.sort('time', ascending=True).reset_index()
sale_loc= pd.DataFrame(A.groupby(['item', 'Location'], sort = False).first()).reset_index()
for i in sale_loc.index:
sale_cutoff = (A.time[i] + dt.timedelta(hours=1)).time
但是时间操作部分出现错误。我尝试了不同的功能,我也尝试添加一个新列 A (time+1hour) 而不是循环,但类似的问题...
import numpy as np
import pandas as pd
df = pd.DataFrame({'Location': ['Canada', 'Canada', 'Canada', 'Canada', 'US', 'US', 'Canada', 'Canada', 'US', 'US'], 'item': ['X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y'], 'time': ['10:03:18', '10:08:38', '10:24:46', '11:16:35', '10:00:16', '11:52:12', '2:08:38', '4:01:48', '13:32:02', '14:07:03']})
df['start'] = pd.to_datetime(df['time'])
grouped = df.groupby(['item', 'Location'])
df['end'] = (grouped['start'].transform(lambda grp: grp.min()+pd.Timedelta(hours=1)))
df['mask'] = (df['start'] < df['end'])
result = grouped['mask'].sum()
print(result)
产量
item Location
X Canada 3.0
US 1.0
Y Canada 1.0
US 2.0
Name: mask, dtype: float64
主要思路是按item
和Location
分组,求每组最短开始时间,然后加1小时:
df['end'] = (grouped['start'].transform(lambda grp: grp.min()+pd.Timedelta(hours=1)))
transform
returns 与 df
长度相同的系列,因此每一行都有一个值:
In [319]: df
Out[319]:
Location item time start end
0 Canada X 10:03:18 2016-05-06 10:03:18 2016-05-06 11:03:18
1 Canada X 10:08:38 2016-05-06 10:08:38 2016-05-06 11:03:18
2 Canada X 10:24:46 2016-05-06 10:24:46 2016-05-06 11:03:18
3 Canada X 11:16:35 2016-05-06 11:16:35 2016-05-06 11:03:18
4 US X 10:00:16 2016-05-06 10:00:16 2016-05-06 11:00:16
5 US X 11:52:12 2016-05-06 11:52:12 2016-05-06 11:00:16
6 Canada Y 2:08:38 2016-05-06 02:08:38 2016-05-06 03:08:38
7 Canada Y 4:01:48 2016-05-06 04:01:48 2016-05-06 03:08:38
8 US Y 13:32:02 2016-05-06 13:32:02 2016-05-06 14:32:02
9 US Y 14:07:03 2016-05-06 14:07:03 2016-05-06 14:32:02
现在您可以轻松识别感兴趣的行。它们是 start
小于 end
:
的那些
In [320]: df['mask'] = (df['start'] < df['end'])
In [321]: df
Out[321]:
Location item time start end mask
0 Canada X 10:03:18 2016-05-06 10:03:18 2016-05-06 11:03:18 True
1 Canada X 10:08:38 2016-05-06 10:08:38 2016-05-06 11:03:18 True
2 Canada X 10:24:46 2016-05-06 10:24:46 2016-05-06 11:03:18 True
3 Canada X 11:16:35 2016-05-06 11:16:35 2016-05-06 11:03:18 False
4 US X 10:00:16 2016-05-06 10:00:16 2016-05-06 11:00:16 True
5 US X 11:52:12 2016-05-06 11:52:12 2016-05-06 11:00:16 False
6 Canada Y 2:08:38 2016-05-06 02:08:38 2016-05-06 03:08:38 True
7 Canada Y 4:01:48 2016-05-06 04:01:48 2016-05-06 03:08:38 False
8 US Y 13:32:02 2016-05-06 13:32:02 2016-05-06 14:32:02 True
9 US Y 14:07:03 2016-05-06 14:07:03 2016-05-06 14:32:02 True
再次按 item
和 Location
分组,通过对每组 mask
为真的次数求和来找到所需的结果:
result = grouped['mask'].sum()
我没有生成整个代码,而是专注于您所说的引发错误的部分。这是将一个小时添加到您列出的时间的工作示例:
sale_time = ['10:03:18', '10:08:38', '11:16:35', '10:00:16']
import datetime
for i in sale_time:
sale_time1 = datetime.time(hour = int(i[0:2]), minute=int(i[3:5]), second=int(i[6:8]))
print(sale_time1)
sale_cutoff = datetime.time(sale_time1.hour+1, sale_time1.minute, sale_time1.second)
print(sale_cutoff)
我从 Excel 读取了销售交易 table,我想知道第一件商品售出后 1 小时内的销售数量。设 A
为销售报表,我要创建 B
.
A=
item Location time
X Canada 10:03:18
X Canada 10:08:38
X Canada 10:24:46
X Canada 11:16:35
X US 10:00:16
X US 11:52:12
Y Canada 2:08:38
Y Canada 4:01:48
Y US 13:32:02
Y US 14:07:03
B=
item location first sale count
X Canada 10:03:18 3
X US 10:00:16 1
Y Canada 2:08:38 1
Y US 13:32:02 2
这是我所做的:
A= A.sort('time', ascending=True).reset_index()
sale_loc= pd.DataFrame(A.groupby(['item', 'Location'], sort = False).first()).reset_index()
for i in sale_loc.index:
sale_cutoff = (A.time[i] + dt.timedelta(hours=1)).time
但是时间操作部分出现错误。我尝试了不同的功能,我也尝试添加一个新列 A (time+1hour) 而不是循环,但类似的问题...
import numpy as np
import pandas as pd
df = pd.DataFrame({'Location': ['Canada', 'Canada', 'Canada', 'Canada', 'US', 'US', 'Canada', 'Canada', 'US', 'US'], 'item': ['X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y'], 'time': ['10:03:18', '10:08:38', '10:24:46', '11:16:35', '10:00:16', '11:52:12', '2:08:38', '4:01:48', '13:32:02', '14:07:03']})
df['start'] = pd.to_datetime(df['time'])
grouped = df.groupby(['item', 'Location'])
df['end'] = (grouped['start'].transform(lambda grp: grp.min()+pd.Timedelta(hours=1)))
df['mask'] = (df['start'] < df['end'])
result = grouped['mask'].sum()
print(result)
产量
item Location
X Canada 3.0
US 1.0
Y Canada 1.0
US 2.0
Name: mask, dtype: float64
主要思路是按item
和Location
分组,求每组最短开始时间,然后加1小时:
df['end'] = (grouped['start'].transform(lambda grp: grp.min()+pd.Timedelta(hours=1)))
transform
returns 与 df
长度相同的系列,因此每一行都有一个值:
In [319]: df
Out[319]:
Location item time start end
0 Canada X 10:03:18 2016-05-06 10:03:18 2016-05-06 11:03:18
1 Canada X 10:08:38 2016-05-06 10:08:38 2016-05-06 11:03:18
2 Canada X 10:24:46 2016-05-06 10:24:46 2016-05-06 11:03:18
3 Canada X 11:16:35 2016-05-06 11:16:35 2016-05-06 11:03:18
4 US X 10:00:16 2016-05-06 10:00:16 2016-05-06 11:00:16
5 US X 11:52:12 2016-05-06 11:52:12 2016-05-06 11:00:16
6 Canada Y 2:08:38 2016-05-06 02:08:38 2016-05-06 03:08:38
7 Canada Y 4:01:48 2016-05-06 04:01:48 2016-05-06 03:08:38
8 US Y 13:32:02 2016-05-06 13:32:02 2016-05-06 14:32:02
9 US Y 14:07:03 2016-05-06 14:07:03 2016-05-06 14:32:02
现在您可以轻松识别感兴趣的行。它们是 start
小于 end
:
In [320]: df['mask'] = (df['start'] < df['end'])
In [321]: df
Out[321]:
Location item time start end mask
0 Canada X 10:03:18 2016-05-06 10:03:18 2016-05-06 11:03:18 True
1 Canada X 10:08:38 2016-05-06 10:08:38 2016-05-06 11:03:18 True
2 Canada X 10:24:46 2016-05-06 10:24:46 2016-05-06 11:03:18 True
3 Canada X 11:16:35 2016-05-06 11:16:35 2016-05-06 11:03:18 False
4 US X 10:00:16 2016-05-06 10:00:16 2016-05-06 11:00:16 True
5 US X 11:52:12 2016-05-06 11:52:12 2016-05-06 11:00:16 False
6 Canada Y 2:08:38 2016-05-06 02:08:38 2016-05-06 03:08:38 True
7 Canada Y 4:01:48 2016-05-06 04:01:48 2016-05-06 03:08:38 False
8 US Y 13:32:02 2016-05-06 13:32:02 2016-05-06 14:32:02 True
9 US Y 14:07:03 2016-05-06 14:07:03 2016-05-06 14:32:02 True
再次按 item
和 Location
分组,通过对每组 mask
为真的次数求和来找到所需的结果:
result = grouped['mask'].sum()
我没有生成整个代码,而是专注于您所说的引发错误的部分。这是将一个小时添加到您列出的时间的工作示例:
sale_time = ['10:03:18', '10:08:38', '11:16:35', '10:00:16']
import datetime
for i in sale_time:
sale_time1 = datetime.time(hour = int(i[0:2]), minute=int(i[3:5]), second=int(i[6:8]))
print(sale_time1)
sale_cutoff = datetime.time(sale_time1.hour+1, sale_time1.minute, sale_time1.second)
print(sale_cutoff)