根据产品组的创建和交货日期计算订单队列
calculate the queue for orders based on creation and delivery date, by product group
我有一个 Pandas 数据框,其中包含很多订单的记录,每个订单一个记录。每条记录有 order_id
、category_id
、created_at
和 picked_at
。我需要在创建时计算每个订单的队列长度。这意味着对于每条记录 current_order
我需要计算具有以下条件的行数:
- 必须与
current_order
具有相同的 category_id
- 必须在
current_order
的 created_at
之前创建
- 必须在
current_order
的 created_at
之后选择
数据帧非常大,因此使用循环进行计算太耗时。
我怎样才能更快地做到这一点?
如有任何帮助,我们将不胜感激。
已编辑
数据帧示例:
id category_id created_at picked_at
0 123228779 69558 2021-05-22 00:08:46 2021-05-22 00:22:45
1 123228972 69558 2021-05-22 00:12:39 2021-05-22 00:17:00
2 123229120 6725 2021-05-22 00:15:47 2021-05-22 00:42:50
3 123229210 41358 2021-05-22 00:17:44 2021-05-22 00:35:34
4 123229152 6725 2021-05-22 00:16:29 2021-05-22 01:05:43
让我们首先从重塑数据框开始,让 created_at
和 picked_at
在同一列中。然后我们计算队列值。
df2 = (df.melt(id_vars=['id', 'category_id'],
var_name='type',
value_name='time')
.sort_values(by=['category_id', 'time']) # not required to sort by "category_id",
# but done here for clarity
)
df2['queue'] = (df2['type'].map({'created_at': 1, 'picked_at': -1})
.cumsum()
)
>>> df2
id category_id type time queue
2 123229120 6725 created_at 2021-05-22 00:15:47 1
4 123229152 6725 created_at 2021-05-22 00:16:29 2
7 123229120 6725 picked_at 2021-05-22 00:42:50 1
9 123229152 6725 picked_at 2021-05-22 01:05:43 0
3 123229210 41358 created_at 2021-05-22 00:17:44 1
8 123229210 41358 picked_at 2021-05-22 00:35:34 0
0 123228779 69558 created_at 2021-05-22 00:08:46 1
1 123228972 69558 created_at 2021-05-22 00:12:39 2
6 123228972 69558 picked_at 2021-05-22 00:17:00 1
5 123228779 69558 picked_at 2021-05-22 00:22:45 0
最后,我们将队列重塑为原始数据帧:
df['queue'] = (df2.pivot(columns=['type'],
values=['queue'])
.loc[:, ('queue', 'created_at')]
.dropna()
.astype(int)
)
输出:
id category_id created_at picked_at queue
0 123228779 69558 2021-05-22 00:08:46 2021-05-22 00:22:45 1
1 123228972 69558 2021-05-22 00:12:39 2021-05-22 00:17:00 2
2 123229120 6725 2021-05-22 00:15:47 2021-05-22 00:42:50 1
3 123229210 41358 2021-05-22 00:17:44 2021-05-22 00:35:34 1
4 123229152 6725 2021-05-22 00:16:29 2021-05-22 01:05:43 2
注意。根据 category_id
,这为我们提供了创建后的队列。
我有一个 Pandas 数据框,其中包含很多订单的记录,每个订单一个记录。每条记录有 order_id
、category_id
、created_at
和 picked_at
。我需要在创建时计算每个订单的队列长度。这意味着对于每条记录 current_order
我需要计算具有以下条件的行数:
- 必须与
current_order
具有相同的 - 必须在
current_order
的 - 必须在
current_order
的
category_id
created_at
之前创建
created_at
之后选择
数据帧非常大,因此使用循环进行计算太耗时。 我怎样才能更快地做到这一点?
如有任何帮助,我们将不胜感激。
已编辑
数据帧示例:
id category_id created_at picked_at
0 123228779 69558 2021-05-22 00:08:46 2021-05-22 00:22:45
1 123228972 69558 2021-05-22 00:12:39 2021-05-22 00:17:00
2 123229120 6725 2021-05-22 00:15:47 2021-05-22 00:42:50
3 123229210 41358 2021-05-22 00:17:44 2021-05-22 00:35:34
4 123229152 6725 2021-05-22 00:16:29 2021-05-22 01:05:43
让我们首先从重塑数据框开始,让 created_at
和 picked_at
在同一列中。然后我们计算队列值。
df2 = (df.melt(id_vars=['id', 'category_id'],
var_name='type',
value_name='time')
.sort_values(by=['category_id', 'time']) # not required to sort by "category_id",
# but done here for clarity
)
df2['queue'] = (df2['type'].map({'created_at': 1, 'picked_at': -1})
.cumsum()
)
>>> df2
id category_id type time queue
2 123229120 6725 created_at 2021-05-22 00:15:47 1
4 123229152 6725 created_at 2021-05-22 00:16:29 2
7 123229120 6725 picked_at 2021-05-22 00:42:50 1
9 123229152 6725 picked_at 2021-05-22 01:05:43 0
3 123229210 41358 created_at 2021-05-22 00:17:44 1
8 123229210 41358 picked_at 2021-05-22 00:35:34 0
0 123228779 69558 created_at 2021-05-22 00:08:46 1
1 123228972 69558 created_at 2021-05-22 00:12:39 2
6 123228972 69558 picked_at 2021-05-22 00:17:00 1
5 123228779 69558 picked_at 2021-05-22 00:22:45 0
最后,我们将队列重塑为原始数据帧:
df['queue'] = (df2.pivot(columns=['type'],
values=['queue'])
.loc[:, ('queue', 'created_at')]
.dropna()
.astype(int)
)
输出:
id category_id created_at picked_at queue
0 123228779 69558 2021-05-22 00:08:46 2021-05-22 00:22:45 1
1 123228972 69558 2021-05-22 00:12:39 2021-05-22 00:17:00 2
2 123229120 6725 2021-05-22 00:15:47 2021-05-22 00:42:50 1
3 123229210 41358 2021-05-22 00:17:44 2021-05-22 00:35:34 1
4 123229152 6725 2021-05-22 00:16:29 2021-05-22 01:05:43 2
注意。根据 category_id
,这为我们提供了创建后的队列。