如何根据数据框中该唯一列值的下一个日期计算每个唯一列值的差异?

How to calculate difference for every unique column value based on the next date in dataframe for that unique column value?

我有一个像这样的 df:

date       | prod_number | prod_count | prod_factor
2018-01-01 | 1           | 5          | 3
2018-02-01 | 1           | 20         | 3
2018-04-01 | 1           | 10         | 3
2019-09-01 | 2           | 8          | 5
2018-09-02 | 2           | 7          | 5
2018-10-03 | 2           | 10         | 5

对于每个“prod_number”,我想得到上次日期的变化,然后乘以 prod_factor:

每个“prod_number”的第一个条目没有任何计算差异的依据,所以它是 NONE 或 0,哪个更容易。

喜欢:

date       | prod_number | prod_count | prod_factor | change      | prod_factor*change
2018-01-01 | 1           | 5          | 3           | NONE/0      | NONE/0
2018-02-01 | 1           | 20         | 3           | 15 # 20-5   | 45  # 3*15
2018-04-01 | 1           | 10         | 3           | -10 # 10-20 | -30 # 3*-10

2019-09-01 | 2           | 8          | 5           | NONE/0      | NONE/0
2018-09-02 | 2           | 7          | 5           | -1 # 7-8    | -5  # 5*-1
2018-10-03 | 2           | 10         | 5           | 3 # 10-7    | 15  # 5*3

如何使用 pandas 实现此目的?

使用groupby.diff然后将两列相乘:

df['change'] = df.groupby('prod_number')['prod_count'].diff()
df['prod_factor*change'] = df['change'] * df['prod_factor']

         date  prod_number  prod_count  prod_factor  change  prod_factor*change
0  2018-01-01            1           5            3     NaN                 NaN
1  2018-02-01            1          20            3    15.0                45.0
2  2018-04-01            1          10            3   -10.0               -30.0
3  2019-09-01            2           8            5     NaN                 NaN
4  2018-09-02            2           7            5    -1.0                -5.0
5  2018-10-03            2          10            5     3.0                15.0

您可以使用 np.where 和 diff()

import pandas as pd
import numpy as np
df=pd.DataFrame([['2018 - 01 - 01',1,5,3],['2018 - 02 - 01',1,20,3],['2018 - 04 - 01',1,10,3],['2019 - 09 - 01',2,8,5],['2018 - 09 - 02',2,7,5],['2018 - 10 - 03',2,10,5]  ],
                columns=['date','prod_number','prod_count','prod_factor'])
df['change']=np.where(
    df['prod_number'].diff() == 0, #cond to check if  prod_number is the same
    df['prod_count'].diff(), #value if true
  0  #else we 0
)
                 date  prod_number  prod_count  prod_factor  change
0  2018 - 01 - 01            1           5            3     0.0
1  2018 - 02 - 01            1          20            3    15.0
2  2018 - 04 - 01            1          10            3   -10.0
3  2019 - 09 - 01            2           8            5     0.0
4  2018 - 09 - 02            2           7            5    -1.0
5  2018 - 10 - 03            2          10            5     3.0