需要扩展库存日志(日志)pandas 数据框以包含每个产品 ID 的所有日期

Need to expand an inventory journal (log) pandas dataframe to include all dates per product id

我有一个库存日志,其中包含产品及其相对库存数量 (resulting_qty) 以及每次添加或减去库存时 loss/gain (delta_qty)。

问题是库存记录不会每天更新,而是仅在库存发生变化时才会更新。出于这个原因,很难提取给定日期所有项目的总库存数量,因为有些项目没有在特定日期记录,尽管他们确实有可用库存,因为他们最后一次输入 resulting_qty 是大于 0。从逻辑上讲,这意味着一件商品在一定天数内没有数量变化,天数等于最大日期和最后记录日期之间的天数。

我的数据看起来像这样,但实际上有成千上万的产品 ID

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2017-03-06 | 2017-03-06 12:24:22 | A   | 0         | 0.0           |
| 2017-03-31 | 2017-03-31 02:43:11 | A   | 3         | 3.0           |
| 2017-04-08 | 2017-04-08 22:04:35 | A   | -1        | 2.0           |
| 2017-04-12 | 2017-04-12 18:26:39 | A   | -1        | 1.0           |
| 2017-04-19 | 2017-04-19 09:15:38 | A   | -1        | 0.0           |
| 2019-01-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-05 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-22 | 2019-04-22 11:06:33 | B   | -1        | 1.0           |
| 2019-04-23 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-09 | 2019-05-09 16:25:41 | C   | 2         | 2.0           |

本质上,我需要让数据看起来更像这样,这样我就可以简单地提取一个日期,并在按日期分组时获得给定日期的总库存总和(例如 df.groupby(date ).resulting_qty.sum()):

注意 由于字符限制,我删除了 PID= A 行,但我希望你能理解:

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2019-01-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-29 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-30 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-31 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-05 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-06 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-07 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-08 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-09 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-10 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-11 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-12 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-13 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-14 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-15 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-05 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-06 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-07 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-08 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-09 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-10 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-11 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-12 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-13 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-14 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-15 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-29 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-30 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-31 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-05 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-06 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-07 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-08 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-09 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-10 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-11 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-12 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-13 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-14 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-15 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-16 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-17 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-18 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-19 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-20 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-21 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-22 | 2019-04-22 11:06:33 | B   | -1        | 1.0           |
| 2019-04-23 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-24 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-25 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-26 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-27 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-28 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-29 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-30 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-01 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-02 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-03 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-04 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-05 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-06 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-07 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-08 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-09 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-10 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-01-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-29 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-30 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-31 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-18 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-18 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-29 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-30 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-31 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |

到目前为止,我所做的是创建一系列循环,这些循环生成一个介于产品生命周期的最短日期和所有产品的最大日期之间的日期范围。然后,如果没有关于新日期的信息,我将最后记录的行值附加为具有新日期的新行。我将这些附加到列表中,然后使用更新后的列表生成一个新的数据框。该代码非常慢,需要 2 个多小时才能完成整个数据集:

date_list = []
pid_list= []
time_stamp_list = []
delta_qty_list = []
resulting_qty_list = []


timer = len(test.product_id.unique().tolist())
counter = 0
for product in test.product_id.unique().tolist():
    counter+=1
    print((counter/timer)*100)
    temp_df = test.query(f'product_id=={product}', engine='python')
    for idx,date in enumerate(pd.date_range(temp_df.index.min(),test.index.max()).tolist()):
        min_date= temp_df.index.min()
        if date.date() == min_date:
            date2=min_date
            pid = temp_df.loc[date2]['product_id']
            timestamp = temp_df.loc[date2]['timestamp']
            delta_qty = temp_df.loc[date2]['delta_qty']
            resulting_qty = temp_df.loc[date2]['resulting_qty']
            date_list.append(date2)
            pid_list.append(pid)
            delta_qty_list.append(delta_qty)
            time_stamp_list.append(timestamp)
            resulting_qty_list.append(resulting_qty)
        else:

            if date.date() in temp_df.index:
                date2= date.date()
                pid = temp_df.loc[date2]['product_id']
                timestamp = temp_df.loc[date2]['timestamp']
                delta_qty = temp_df.loc[date2]['delta_qty']
                resulting_qty = temp_df.loc[date2]['resulting_qty']
                date_list.append(date2)
                pid_list.append(pid)
                delta_qty_list.append(delta_qty)
                time_stamp_list.append(timestamp)
                resulting_qty_list.append(resulting_qty)
            elif date.date() > date2:
                date_list.append(date.date())
                pid_list.append(pid)
                time_stamp_list.append(timestamp)
                delta_qty_list.append(delta_qty)
                resulting_qty_list.append(resulting_qty)
            else:
                pass

有人可以帮助我理解什么是正确的处理方法吗,因为我 100% 确定这不是最好的方法。

谢谢

这里的想法是重新索引 DataFrame 以填补您的空白。

设置使用您的示例生成的 DataFrame

from io import StringIO

buffer = StringIO()
buffer.write('''\
date|timestamp|pid|delta_qty|resulting_qty
2017-03-06|2017-03-06 12:24:22|A|0|0.0          
2017-03-31|2017-03-31 02:43:11|A|3|3.0          
2017-04-08|2017-04-08 22:04:35|A|-1|2.0          
2017-04-12|2017-04-12 18:26:39|A|-1|1.0          
2017-04-19|2017-04-19 09:15:38|A|-1|0.0          
2019-01-16|2019-01-16 23:37:17|B|0|0.0          
2019-01-19|2019-01-19 09:40:38|C|0|0.0          
2019-04-05|2019-04-05 16:40:32|B|2|2.0          
2019-04-22|2019-04-22 11:06:33|B|-1|1.0          
2019-04-23|2019-04-23 13:23:17|B|-1|0.0          
2019-05-09|2019-05-09 16:25:41|C|2|2.0          
''')
buffer.seek(0)

df = pd.read_csv(buffer, sep='|', parse_dates=['date', 'timestamp'])

首先,我们在每个产品的最小日期和最大日期之间生成一个新的 gap-less 索引。根据您的示例,这具有在上次现有更新之后没有产品行的效果。但是,此步骤很容易定制以满足您的具体要求。例如,如果您希望日期从第一次输入产品到今天,您可以手动设置 startend

from itertools import chain, cycle

date_ranges = df.groupby('pid').agg({'date': ['min', 'max']})

pairs = (zip(cycle([pid]), pd.date_range(start, end)) 
         for pid, (start, end) in date_ranges.iterrows())
new_index = pd.Index(chain.from_iterable(pairs), name=['pid', 'date'])

然后我们应用新索引。这里我们有两个选择:

  1. 根据您的示例,我们将完全按照上次更新继续填充
  2. 0 填充 delta_qty 和最后更新的剩余列(这与您的要求有偏差,但看起来合乎逻辑,只是一个小改动)

无论哪种情况,两个基本概念是.reindex 方法和.fillna 方法。我们可以使用 reindex 来扩展密集的 DataFrame 以包含所有日期但数据稀疏。然后,我们用正确的数据填充 nans。由于我们是上次更新的 forward-padding,我们希望根据 docs

指定 method='ffill'

方法一:

# this fills the rows per last update
results = df.set_index(['pid', 'date'])\
    .reindex(new_index).reset_index()
results.fillna(method='ffill', inplace=True)

这个returns

    pid       date           timestamp  delta_qty  resulting_qty
0     A 2017-03-06 2017-03-06 12:24:22        0.0            0.0
1     A 2017-03-07 2017-03-06 12:24:22        0.0            0.0
2     A 2017-03-08 2017-03-06 12:24:22        0.0            0.0
3     A 2017-03-09 2017-03-06 12:24:22        0.0            0.0
..   ..        ...                 ...        ...            ...
24    A 2017-03-30 2017-03-06 12:24:22        0.0            0.0
25    A 2017-03-31 2017-03-31 02:43:11        3.0            3.0
..   ..        ...                 ...        ...            ...
29    A 2017-04-04 2017-03-31 02:43:11        3.0            3.0

对于pid == 'A'

方法二:

results = df.set_index(['pid', 'date'])\
    .reindex(new_index).reset_index()
results['delta_qty'].fillna(0, inplace=True)
results.fillna(method='ffill', inplace=True)

这个returns:

    pid       date           timestamp  delta_qty  resulting_qty
0     A 2017-03-06 2017-03-06 12:24:22        0.0            0.0
1     A 2017-03-07 2017-03-06 12:24:22        0.0            0.0
2     A 2017-03-08 2017-03-06 12:24:22        0.0            0.0
3     A 2017-03-09 2017-03-06 12:24:22        0.0            0.0
..   ..        ...                 ...        ...            ...
24    A 2017-03-30 2017-03-06 12:24:22        0.0            0.0
25    A 2017-03-31 2017-03-31 02:43:11        3.0            3.0
..   ..        ...                 ...        ...            ...
29    A 2017-04-04 2017-03-31 02:43:11        0.0            3.0