如何根据数据框中该唯一列值的下一个日期计算每个唯一列值的差异?
How to calculate difference for every unique column value based on the next date in dataframe for that unique column value?
我有一个像这样的 df:
date | prod_number | prod_count | prod_factor
2018-01-01 | 1 | 5 | 3
2018-02-01 | 1 | 20 | 3
2018-04-01 | 1 | 10 | 3
2019-09-01 | 2 | 8 | 5
2018-09-02 | 2 | 7 | 5
2018-10-03 | 2 | 10 | 5
对于每个“prod_number”,我想得到上次日期的变化,然后乘以 prod_factor:
每个“prod_number”的第一个条目没有任何计算差异的依据,所以它是 NONE 或 0,哪个更容易。
喜欢:
date | prod_number | prod_count | prod_factor | change | prod_factor*change
2018-01-01 | 1 | 5 | 3 | NONE/0 | NONE/0
2018-02-01 | 1 | 20 | 3 | 15 # 20-5 | 45 # 3*15
2018-04-01 | 1 | 10 | 3 | -10 # 10-20 | -30 # 3*-10
2019-09-01 | 2 | 8 | 5 | NONE/0 | NONE/0
2018-09-02 | 2 | 7 | 5 | -1 # 7-8 | -5 # 5*-1
2018-10-03 | 2 | 10 | 5 | 3 # 10-7 | 15 # 5*3
如何使用 pandas 实现此目的?
使用groupby.diff
然后将两列相乘:
df['change'] = df.groupby('prod_number')['prod_count'].diff()
df['prod_factor*change'] = df['change'] * df['prod_factor']
date prod_number prod_count prod_factor change prod_factor*change
0 2018-01-01 1 5 3 NaN NaN
1 2018-02-01 1 20 3 15.0 45.0
2 2018-04-01 1 10 3 -10.0 -30.0
3 2019-09-01 2 8 5 NaN NaN
4 2018-09-02 2 7 5 -1.0 -5.0
5 2018-10-03 2 10 5 3.0 15.0
您可以使用 np.where 和 diff()
import pandas as pd
import numpy as np
df=pd.DataFrame([['2018 - 01 - 01',1,5,3],['2018 - 02 - 01',1,20,3],['2018 - 04 - 01',1,10,3],['2019 - 09 - 01',2,8,5],['2018 - 09 - 02',2,7,5],['2018 - 10 - 03',2,10,5] ],
columns=['date','prod_number','prod_count','prod_factor'])
df['change']=np.where(
df['prod_number'].diff() == 0, #cond to check if prod_number is the same
df['prod_count'].diff(), #value if true
0 #else we 0
)
date prod_number prod_count prod_factor change
0 2018 - 01 - 01 1 5 3 0.0
1 2018 - 02 - 01 1 20 3 15.0
2 2018 - 04 - 01 1 10 3 -10.0
3 2019 - 09 - 01 2 8 5 0.0
4 2018 - 09 - 02 2 7 5 -1.0
5 2018 - 10 - 03 2 10 5 3.0
我有一个像这样的 df:
date | prod_number | prod_count | prod_factor
2018-01-01 | 1 | 5 | 3
2018-02-01 | 1 | 20 | 3
2018-04-01 | 1 | 10 | 3
2019-09-01 | 2 | 8 | 5
2018-09-02 | 2 | 7 | 5
2018-10-03 | 2 | 10 | 5
对于每个“prod_number”,我想得到上次日期的变化,然后乘以 prod_factor:
每个“prod_number”的第一个条目没有任何计算差异的依据,所以它是 NONE 或 0,哪个更容易。
喜欢:
date | prod_number | prod_count | prod_factor | change | prod_factor*change
2018-01-01 | 1 | 5 | 3 | NONE/0 | NONE/0
2018-02-01 | 1 | 20 | 3 | 15 # 20-5 | 45 # 3*15
2018-04-01 | 1 | 10 | 3 | -10 # 10-20 | -30 # 3*-10
2019-09-01 | 2 | 8 | 5 | NONE/0 | NONE/0
2018-09-02 | 2 | 7 | 5 | -1 # 7-8 | -5 # 5*-1
2018-10-03 | 2 | 10 | 5 | 3 # 10-7 | 15 # 5*3
如何使用 pandas 实现此目的?
使用groupby.diff
然后将两列相乘:
df['change'] = df.groupby('prod_number')['prod_count'].diff()
df['prod_factor*change'] = df['change'] * df['prod_factor']
date prod_number prod_count prod_factor change prod_factor*change
0 2018-01-01 1 5 3 NaN NaN
1 2018-02-01 1 20 3 15.0 45.0
2 2018-04-01 1 10 3 -10.0 -30.0
3 2019-09-01 2 8 5 NaN NaN
4 2018-09-02 2 7 5 -1.0 -5.0
5 2018-10-03 2 10 5 3.0 15.0
您可以使用 np.where 和 diff()
import pandas as pd
import numpy as np
df=pd.DataFrame([['2018 - 01 - 01',1,5,3],['2018 - 02 - 01',1,20,3],['2018 - 04 - 01',1,10,3],['2019 - 09 - 01',2,8,5],['2018 - 09 - 02',2,7,5],['2018 - 10 - 03',2,10,5] ],
columns=['date','prod_number','prod_count','prod_factor'])
df['change']=np.where(
df['prod_number'].diff() == 0, #cond to check if prod_number is the same
df['prod_count'].diff(), #value if true
0 #else we 0
)
date prod_number prod_count prod_factor change
0 2018 - 01 - 01 1 5 3 0.0
1 2018 - 02 - 01 1 20 3 15.0
2 2018 - 04 - 01 1 10 3 -10.0
3 2019 - 09 - 01 2 8 5 0.0
4 2018 - 09 - 02 2 7 5 -1.0
5 2018 - 10 - 03 2 10 5 3.0