Pandas - 与 DatetimeIndex 交叉引用 - Groupby
Pandas - Cross referencing with DatetimeIndex - Groupby
我有很多公司按月(月末)的数据。我想为 each company
创建一个包含 groupby
的新列,其中:
new_col
今年7月到明年6月取去年12月的值
- 例如,从 Jul-2000 到 Jun-2001 的
new_col
值将等于 old_col
Dec-1999 的值
您可以在此处下载示例数据:https://www.dropbox.com/s/oz1ltblh6u0chzt/tem_20220506.csv?dl=0
我一直在尝试使用此代码行但未成功:
df = pd.read_csv('tem_20220506.csv', parse_dates=['date'])
df.drop(columns=['new_col', 'Note'], inplace=True)
df = df.set_index('date').rename_axis(None)
df['new_col'] = df.groupby('comp').apply(lambda g: --- ) # ← I am now stuck here
期望输出:
comp old_col new_col \
2000-01-31 a 1 NaN
2000-02-29 a 2 NaN
2000-03-31 a 3 NaN
2000-04-30 a 4 NaN
2000-05-31 a 5 NaN
2000-06-30 a 6 NaN
2000-07-31 a 7 NaN
2000-08-31 a 8 NaN
2000-09-30 a 9 NaN
2000-10-31 a 10 NaN
2000-11-30 a 11 NaN
2000-12-31 a 12 NaN
2001-01-31 a 13 NaN
2001-02-28 a 14 NaN
2001-03-31 a 15 NaN
2001-04-30 a 16 NaN
2001-05-31 a 17 NaN
2001-06-30 a 18 NaN
2001-07-31 a 19 12.000
2001-08-31 a 20 12.000
2001-09-30 a 21 12.000
2001-10-31 a 22 12.000
2001-11-30 a 23 12.000
2001-12-31 a 24 12.000
2002-01-31 a 25 12.000
2002-02-28 a 26 12.000
2002-03-31 a 27 12.000
2002-04-30 a 28 12.000
2002-05-31 a 29 12.000
2002-06-30 a 30 12.000
2002-07-31 a 31 24.000
2002-08-31 a 32 24.000
2002-09-30 a 33 24.000
2002-10-31 a 34 24.000
2002-11-30 a 35 24.000
2002-12-31 a 36 24.000
2000-01-31 b 101 NaN
2000-02-29 b 102 NaN
2000-03-31 b 103 NaN
2000-04-30 b 104 NaN
2000-05-31 b 105 NaN
2000-06-30 b 106 NaN
2000-07-31 b 107 NaN
2000-08-31 b 108 NaN
2000-09-30 b 109 NaN
2000-10-31 b 110 NaN
2000-11-30 b 111 NaN
2001-01-31 b 113 NaN
2001-02-28 b 114 NaN
2001-03-31 b 115 NaN
2001-04-30 b 116 NaN
2001-05-31 b 117 NaN
2001-06-30 b 118 NaN
2001-07-31 b 119 NaN
2001-08-31 b 120 NaN
2001-09-30 b 121 NaN
2001-10-31 b 122 NaN
2001-11-30 b 123 NaN
2001-12-31 b 124 NaN
2002-01-31 b 125 NaN
2002-02-28 b 126 NaN
2002-03-31 b 127 NaN
2002-04-30 b 128 NaN
2002-05-31 b 129 NaN
2002-06-30 b 130 NaN
2002-07-31 b 131 124.000
2002-08-31 b 132 124.000
2002-10-31 b 134 124.000
2002-11-30 b 135 124.000
2002-12-31 b 136 124.000
(!!) 注意: for comp==b
:
从 Jul-2001 到 Jun-2002 是 NaN 因为 Dec-2000 的值是 missing
有 missing
Sep-2002,但还可以
df = pd.read_csv('tem_20220506.csv', parse_dates=['date'])
df.drop(columns=['new_col', 'Note'], inplace=True)
df.set_index('date', inplace=True)
使用辅助函数根据旧列获取新列
def helper_func(x):
# get the date values corresponding to month = 12
req_values = x[x.index.month == 12].to_dict()['old_col']
# iterate over those dates and replace July to June range depending on the year of the date
for date_value, old_col_value in req_values.items():
x.loc[f'{date_value.year+1}-07-31':f'{date_value.year+2}-06-30', 'new_col'] = old_col_value
return x
df['new_col'] = df.groupby('comp')[['old_col']].apply(helper_func)['new_col']
这将提供数据框作为您想要的输出
备用辅助函数
def helper_fun2(x):
"""
1. iterate over years
2. update July to June next two years value, using DEC value of current year
"""
for year in x.index.year.unique():
if f'{year}-12-31' in x.index:
x.loc[f'{year+1}-07-31':f'{year+2}-06-30', 'new_col'] = x.loc[f'{year}-12-31']['old_col']
return x
我有很多公司按月(月末)的数据。我想为 each company
创建一个包含 groupby
的新列,其中:
new_col
今年7月到明年6月取去年12月的值- 例如,从 Jul-2000 到 Jun-2001 的
new_col
值将等于old_col
Dec-1999 的值
您可以在此处下载示例数据:https://www.dropbox.com/s/oz1ltblh6u0chzt/tem_20220506.csv?dl=0
我一直在尝试使用此代码行但未成功:
df = pd.read_csv('tem_20220506.csv', parse_dates=['date'])
df.drop(columns=['new_col', 'Note'], inplace=True)
df = df.set_index('date').rename_axis(None)
df['new_col'] = df.groupby('comp').apply(lambda g: --- ) # ← I am now stuck here
期望输出:
comp old_col new_col \
2000-01-31 a 1 NaN
2000-02-29 a 2 NaN
2000-03-31 a 3 NaN
2000-04-30 a 4 NaN
2000-05-31 a 5 NaN
2000-06-30 a 6 NaN
2000-07-31 a 7 NaN
2000-08-31 a 8 NaN
2000-09-30 a 9 NaN
2000-10-31 a 10 NaN
2000-11-30 a 11 NaN
2000-12-31 a 12 NaN
2001-01-31 a 13 NaN
2001-02-28 a 14 NaN
2001-03-31 a 15 NaN
2001-04-30 a 16 NaN
2001-05-31 a 17 NaN
2001-06-30 a 18 NaN
2001-07-31 a 19 12.000
2001-08-31 a 20 12.000
2001-09-30 a 21 12.000
2001-10-31 a 22 12.000
2001-11-30 a 23 12.000
2001-12-31 a 24 12.000
2002-01-31 a 25 12.000
2002-02-28 a 26 12.000
2002-03-31 a 27 12.000
2002-04-30 a 28 12.000
2002-05-31 a 29 12.000
2002-06-30 a 30 12.000
2002-07-31 a 31 24.000
2002-08-31 a 32 24.000
2002-09-30 a 33 24.000
2002-10-31 a 34 24.000
2002-11-30 a 35 24.000
2002-12-31 a 36 24.000
2000-01-31 b 101 NaN
2000-02-29 b 102 NaN
2000-03-31 b 103 NaN
2000-04-30 b 104 NaN
2000-05-31 b 105 NaN
2000-06-30 b 106 NaN
2000-07-31 b 107 NaN
2000-08-31 b 108 NaN
2000-09-30 b 109 NaN
2000-10-31 b 110 NaN
2000-11-30 b 111 NaN
2001-01-31 b 113 NaN
2001-02-28 b 114 NaN
2001-03-31 b 115 NaN
2001-04-30 b 116 NaN
2001-05-31 b 117 NaN
2001-06-30 b 118 NaN
2001-07-31 b 119 NaN
2001-08-31 b 120 NaN
2001-09-30 b 121 NaN
2001-10-31 b 122 NaN
2001-11-30 b 123 NaN
2001-12-31 b 124 NaN
2002-01-31 b 125 NaN
2002-02-28 b 126 NaN
2002-03-31 b 127 NaN
2002-04-30 b 128 NaN
2002-05-31 b 129 NaN
2002-06-30 b 130 NaN
2002-07-31 b 131 124.000
2002-08-31 b 132 124.000
2002-10-31 b 134 124.000
2002-11-30 b 135 124.000
2002-12-31 b 136 124.000
(!!) 注意: for comp==b
:
从 Jul-2001 到 Jun-2002 是 NaN 因为 Dec-2000 的值是
missing
有
missing
Sep-2002,但还可以
df = pd.read_csv('tem_20220506.csv', parse_dates=['date'])
df.drop(columns=['new_col', 'Note'], inplace=True)
df.set_index('date', inplace=True)
使用辅助函数根据旧列获取新列
def helper_func(x):
# get the date values corresponding to month = 12
req_values = x[x.index.month == 12].to_dict()['old_col']
# iterate over those dates and replace July to June range depending on the year of the date
for date_value, old_col_value in req_values.items():
x.loc[f'{date_value.year+1}-07-31':f'{date_value.year+2}-06-30', 'new_col'] = old_col_value
return x
df['new_col'] = df.groupby('comp')[['old_col']].apply(helper_func)['new_col']
这将提供数据框作为您想要的输出
备用辅助函数
def helper_fun2(x):
"""
1. iterate over years
2. update July to June next two years value, using DEC value of current year
"""
for year in x.index.year.unique():
if f'{year}-12-31' in x.index:
x.loc[f'{year+1}-07-31':f'{year+2}-06-30', 'new_col'] = x.loc[f'{year}-12-31']['old_col']
return x