根据 ID 每周更改创建新列
create new column based on weekly change, based on ID
df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/test_df.csv')
id rank date
1991513 FCWFKZVFAHFK7WP4 32 2021-06-01
1991514 FCWEUHFSM2BSQY2N 33 2021-06-01
1991515 FCWFV6T2GGPM8T2P 34 2021-06-01
1991516 FCWEQ8B4QDJJUNEH 35 2021-06-01
1991517 FCWFAUSPJFGDUBRG 36 2021-06-01
我有 1 个月的上述数据,我想创建一个新列 delta_rank_7 告诉我每个 ID 在过去 7 天内排名的变化(2021-06-01 至 2021-06-07 的 NaN)
我可以做这里提到的事情Calculating difference between two rows in Python / Pandas
df.set_index('date').diff(periods=7)
但我每个日期都有多个条目,我想为每个 id.
执行此操作
如果有重复的id
使用:
df = df.set_index('date')
df['delta_rank_7'] = df.groupby('id')['rank'].diff(periods=7)
如果需要 7 天的差异,请使用 DataFrameGroupBy.shift
并减去:
file = 'https://raw.githubusercontent.com/amanaroratc/hello-world/master/test_df.csv'
df=pd.read_csv(file, parse_dates=['date'])
df = df.sort_values(['id','date'])
df = df.merge((df.set_index(['id','date'])['rank']
.sub(df.set_index('date').groupby('id')['rank'].shift(7, freq='d'))
.reset_index(name='delta_rank_7'))
)
print (df)
id rank date delta_rank_7
0 CBKFGPBZMG48K5SF 2 2021-06-15 NaN
1 CBKFGPBZMG48K5SF 19 2021-06-19 NaN
2 CBKFGPBZMG48K5SF 2 2021-06-21 NaN
3 CBKFGPBZMG48K5SF 2 2021-06-22 0.0
4 CBKFGPBZMG48K5SF 48 2021-06-24 NaN
... ... ... ...
10010 FRNEUJZRVQGT94SP 112 2021-06-23 38.0
10011 FRNEUJZRVQGT94SP 109 2021-06-24 35.0
10012 FRNEUJZRVQGT94SP 68 2021-06-27 -73.0
10013 FRNEUJZRVQGT94SP 85 2021-06-28 NaN
10014 FRNEUJZRVQGT94SP 133 2021-06-30 21.0
[10015 rows x 4 columns]
df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/test_df.csv')
id rank date
1991513 FCWFKZVFAHFK7WP4 32 2021-06-01
1991514 FCWEUHFSM2BSQY2N 33 2021-06-01
1991515 FCWFV6T2GGPM8T2P 34 2021-06-01
1991516 FCWEQ8B4QDJJUNEH 35 2021-06-01
1991517 FCWFAUSPJFGDUBRG 36 2021-06-01
我有 1 个月的上述数据,我想创建一个新列 delta_rank_7 告诉我每个 ID 在过去 7 天内排名的变化(2021-06-01 至 2021-06-07 的 NaN)
我可以做这里提到的事情Calculating difference between two rows in Python / Pandas
df.set_index('date').diff(periods=7)
但我每个日期都有多个条目,我想为每个 id.
执行此操作如果有重复的id
使用:
df = df.set_index('date')
df['delta_rank_7'] = df.groupby('id')['rank'].diff(periods=7)
如果需要 7 天的差异,请使用 DataFrameGroupBy.shift
并减去:
file = 'https://raw.githubusercontent.com/amanaroratc/hello-world/master/test_df.csv'
df=pd.read_csv(file, parse_dates=['date'])
df = df.sort_values(['id','date'])
df = df.merge((df.set_index(['id','date'])['rank']
.sub(df.set_index('date').groupby('id')['rank'].shift(7, freq='d'))
.reset_index(name='delta_rank_7'))
)
print (df)
id rank date delta_rank_7
0 CBKFGPBZMG48K5SF 2 2021-06-15 NaN
1 CBKFGPBZMG48K5SF 19 2021-06-19 NaN
2 CBKFGPBZMG48K5SF 2 2021-06-21 NaN
3 CBKFGPBZMG48K5SF 2 2021-06-22 0.0
4 CBKFGPBZMG48K5SF 48 2021-06-24 NaN
... ... ... ...
10010 FRNEUJZRVQGT94SP 112 2021-06-23 38.0
10011 FRNEUJZRVQGT94SP 109 2021-06-24 35.0
10012 FRNEUJZRVQGT94SP 68 2021-06-27 -73.0
10013 FRNEUJZRVQGT94SP 85 2021-06-28 NaN
10014 FRNEUJZRVQGT94SP 133 2021-06-30 21.0
[10015 rows x 4 columns]