根据产品组的创建和交货日期计算订单队列

calculate the queue for orders based on creation and delivery date, by product group

我有一个 Pandas 数据框,其中包含很多订单的记录,每个订单一个记录。每条记录有 order_idcategory_idcreated_atpicked_at。我需要在创建时计算每个订单的队列长度。这意味着对于每条记录 current_order 我需要计算具有以下条件的行数:

数据帧非常大,因此使用循环进行计算太耗时。 我怎样才能更快地做到这一点?

如有任何帮助,我们将不胜感激。

已编辑

数据帧示例:

          id  category_id          created_at           picked_at
0  123228779        69558 2021-05-22 00:08:46 2021-05-22 00:22:45
1  123228972        69558 2021-05-22 00:12:39 2021-05-22 00:17:00
2  123229120         6725 2021-05-22 00:15:47 2021-05-22 00:42:50
3  123229210        41358 2021-05-22 00:17:44 2021-05-22 00:35:34
4  123229152         6725 2021-05-22 00:16:29 2021-05-22 01:05:43

让我们首先从重塑数据框开始,让 created_atpicked_at 在同一列中。然后我们计算队列值。

df2 = (df.melt(id_vars=['id', 'category_id'],
               var_name='type',
               value_name='time')
         .sort_values(by=['category_id', 'time']) # not required to sort by "category_id",
                                                  # but done here for clarity
      )

df2['queue'] = (df2['type'].map({'created_at': 1, 'picked_at': -1})
                           .cumsum()
               )
>>> df2
          id  category_id        type                time  queue
2  123229120         6725  created_at 2021-05-22 00:15:47      1
4  123229152         6725  created_at 2021-05-22 00:16:29      2
7  123229120         6725   picked_at 2021-05-22 00:42:50      1
9  123229152         6725   picked_at 2021-05-22 01:05:43      0
3  123229210        41358  created_at 2021-05-22 00:17:44      1
8  123229210        41358   picked_at 2021-05-22 00:35:34      0
0  123228779        69558  created_at 2021-05-22 00:08:46      1
1  123228972        69558  created_at 2021-05-22 00:12:39      2
6  123228972        69558   picked_at 2021-05-22 00:17:00      1
5  123228779        69558   picked_at 2021-05-22 00:22:45      0

最后,我们将队列重塑为原始数据帧:

df['queue'] = (df2.pivot(columns=['type'],
                         values=['queue'])
                  .loc[:, ('queue', 'created_at')]
                  .dropna()
                  .astype(int)
              )

输出:

          id  category_id          created_at           picked_at  queue
0  123228779        69558 2021-05-22 00:08:46 2021-05-22 00:22:45      1
1  123228972        69558 2021-05-22 00:12:39 2021-05-22 00:17:00      2
2  123229120         6725 2021-05-22 00:15:47 2021-05-22 00:42:50      1
3  123229210        41358 2021-05-22 00:17:44 2021-05-22 00:35:34      1
4  123229152         6725 2021-05-22 00:16:29 2021-05-22 01:05:43      2

注意。根据 category_id,这为我们提供了创建后的队列。