pandas 之前日期的近期加权移动平均值
Recency weighted moving average on previous dates in pandas
我有以下 df:
index = pd.to_datetime(['2017-03-01', '2017-03-01', '2017-02-15', '2017-02-01',
'2017-01-20', '2017-01-20', '2017-01-20', '2017-01-02',
'2016-12-04', '2016-12-04', '2016-12-04', '2016-11-16'])
df = pd.DataFrame(data = {'val': [8, 1, 5, 2, 3 , 5, 9, 14, 13, 2, 1, 12],
'group': ['one', 'two', 'one', 'one', 'two', 'two', 'one', 'two',
'two', 'one', 'one', 'two']},
index=index)
df = df.sort_index()
group val
2016-11-16 two 12
2016-12-04 two 13
2016-12-04 one 2
2016-12-04 one 1
2017-01-02 two 14
2017-01-20 two 3
2017-01-20 two 5
2017-01-20 one 9
2017-02-01 one 2
2017-02-15 one 5
2017-03-01 one 8
2017-03-01 two 1
在每个组(一、二)中,我想要一个 前一个值的近因加权平均值。 例如看第一组:
group val
2016-12-04 one 2
2016-12-04 one 1
2017-01-20 one 9
2017-02-01 one 2
2017-02-15 one 5
2017-03-01 one 8
例如,对于日期 2017-02-15
,我希望计算一个新列,该列具有作为该日期值的先前值的新近度加权版本(过去较近的日期具有更高的权重)是 [2,9,1,2]。请注意,一组中可能有多个日期,并且这些日期应该具有相同的权重。
我认为 pandas 指数加权函数对此有好处。我认为一组中的日期是相同的,我会首先取这些值的平均值,以便稍后可以应用一个简单的 shift() 。我尝试了以下方法:
df = df.reset_index().set_index(['index', 'group']).groupby(
level=[0,1]).mean().reset_index().set_index('index')
现在,如果我对新近度加权不感兴趣,我可以使用
df = df.groupby('group')['val'].expanding().mean().groupby(level=0).shift()
然后与原来的日期和组合并。
但是当我尝试使用 pandas.ewma 时,我遗漏了类似的东西:
df.groupby('group')['val'].ewm(span=27).groupby(level=0).shift()
我可以遍历这些组:
grouped = df.groupby('group')['val']
for key, group in grouped:
print pd.ewma(group, span=27).shift()
index
2016-12-04 NaN
2017-01-20 1.500000
2017-02-01 5.388889
2017-02-15 4.174589
2017-03-01 4.404414
Name: val, dtype: float64
index
2016-11-16 NaN
2016-12-04 12.000000
2017-01-02 12.518519
2017-01-20 13.049360
2017-03-01 10.529680
然后以某种方式将组和日期与原始 df
合并,但这似乎过于复杂。有更好的方法吗?
要执行 新近加权移动平均值 而无需遍历组并重新合并,您可以使用 apply
.
def rwma(group):
# perform the ewma
kwargs = dict(ignore_na=False, span=27, min_periods=0, adjust=True)
result = group.ewm(**kwargs).mean().shift().reset_index()
# rename the result column so that the merge goes smoothly
result.rename(columns={result.columns[-1]: 'rwma'}, inplace=True)
return result
recency = df.groupby('group')['val'].apply(rwma)
测试代码:
import pandas as pd
df = pd.DataFrame(data={
'val': [8, 1, 5, 2, 3, 5, 9, 14, 13, 2, 1, 12],
'group': ['one', 'two', 'one', 'one', 'two', 'two',
'one', 'two', 'two', 'one', 'one', 'two']},
index=pd.to_datetime([
'2017-03-01', '2017-03-01', '2017-02-15', '2017-02-01',
'2017-01-20', '2017-01-20', '2017-01-20', '2017-01-02',
'2016-12-04', '2016-12-04', '2016-12-04', '2016-11-16'])
).sort_index()
recency = df.groupby('group')['val'].apply(rwma)
print(recency)
结果:
index rwma
group
one 0 2016-12-04 NaN
1 2016-12-04 2.000000
2 2017-01-20 1.481481
3 2017-02-01 4.175503
4 2017-02-15 3.569762
5 2017-03-01 3.899694
two 0 2016-11-16 NaN
1 2016-12-04 12.000000
2 2017-01-02 12.518519
3 2017-01-20 13.049360
4 2017-01-20 10.251243
5 2017-03-01 9.039866
基于 Stephen 的回答,这里是一个工作版本:
def rwma(group):
# perform the ewma
kwargs = dict(ignore_na=False, span=27, min_periods=0, adjust=True)
result = group.resample('1D').mean().ewm(**kwargs).mean().shift()
result = result[group.index].reset_index()
# rename the result column so that the merge goes smoothly
result.rename(columns={result.columns[-1]: 'rwma'}, inplace=True)
return result
recency = df.groupby('group')['val'].apply(rwma)
print(recency)
输出:
index rwma
group
one 0 2016-12-04 NaN
1 2016-12-04 NaN
2 2017-01-20 1.500000
3 2017-02-01 8.776518
4 2017-02-15 4.016278
5 2017-03-01 4.670166
two 0 2016-11-16 NaN
1 2016-12-04 12.000000
2 2017-01-02 12.791492
3 2017-01-20 13.844843
4 2017-01-20 13.844843
5 2017-03-01 6.284914
我有以下 df:
index = pd.to_datetime(['2017-03-01', '2017-03-01', '2017-02-15', '2017-02-01',
'2017-01-20', '2017-01-20', '2017-01-20', '2017-01-02',
'2016-12-04', '2016-12-04', '2016-12-04', '2016-11-16'])
df = pd.DataFrame(data = {'val': [8, 1, 5, 2, 3 , 5, 9, 14, 13, 2, 1, 12],
'group': ['one', 'two', 'one', 'one', 'two', 'two', 'one', 'two',
'two', 'one', 'one', 'two']},
index=index)
df = df.sort_index()
group val
2016-11-16 two 12
2016-12-04 two 13
2016-12-04 one 2
2016-12-04 one 1
2017-01-02 two 14
2017-01-20 two 3
2017-01-20 two 5
2017-01-20 one 9
2017-02-01 one 2
2017-02-15 one 5
2017-03-01 one 8
2017-03-01 two 1
在每个组(一、二)中,我想要一个 前一个值的近因加权平均值。 例如看第一组:
group val
2016-12-04 one 2
2016-12-04 one 1
2017-01-20 one 9
2017-02-01 one 2
2017-02-15 one 5
2017-03-01 one 8
例如,对于日期 2017-02-15
,我希望计算一个新列,该列具有作为该日期值的先前值的新近度加权版本(过去较近的日期具有更高的权重)是 [2,9,1,2]。请注意,一组中可能有多个日期,并且这些日期应该具有相同的权重。
我认为 pandas 指数加权函数对此有好处。我认为一组中的日期是相同的,我会首先取这些值的平均值,以便稍后可以应用一个简单的 shift() 。我尝试了以下方法:
df = df.reset_index().set_index(['index', 'group']).groupby(
level=[0,1]).mean().reset_index().set_index('index')
现在,如果我对新近度加权不感兴趣,我可以使用
df = df.groupby('group')['val'].expanding().mean().groupby(level=0).shift()
然后与原来的日期和组合并。 但是当我尝试使用 pandas.ewma 时,我遗漏了类似的东西:
df.groupby('group')['val'].ewm(span=27).groupby(level=0).shift()
我可以遍历这些组:
grouped = df.groupby('group')['val']
for key, group in grouped:
print pd.ewma(group, span=27).shift()
index
2016-12-04 NaN
2017-01-20 1.500000
2017-02-01 5.388889
2017-02-15 4.174589
2017-03-01 4.404414
Name: val, dtype: float64
index
2016-11-16 NaN
2016-12-04 12.000000
2017-01-02 12.518519
2017-01-20 13.049360
2017-03-01 10.529680
然后以某种方式将组和日期与原始 df
合并,但这似乎过于复杂。有更好的方法吗?
要执行 新近加权移动平均值 而无需遍历组并重新合并,您可以使用 apply
.
def rwma(group):
# perform the ewma
kwargs = dict(ignore_na=False, span=27, min_periods=0, adjust=True)
result = group.ewm(**kwargs).mean().shift().reset_index()
# rename the result column so that the merge goes smoothly
result.rename(columns={result.columns[-1]: 'rwma'}, inplace=True)
return result
recency = df.groupby('group')['val'].apply(rwma)
测试代码:
import pandas as pd
df = pd.DataFrame(data={
'val': [8, 1, 5, 2, 3, 5, 9, 14, 13, 2, 1, 12],
'group': ['one', 'two', 'one', 'one', 'two', 'two',
'one', 'two', 'two', 'one', 'one', 'two']},
index=pd.to_datetime([
'2017-03-01', '2017-03-01', '2017-02-15', '2017-02-01',
'2017-01-20', '2017-01-20', '2017-01-20', '2017-01-02',
'2016-12-04', '2016-12-04', '2016-12-04', '2016-11-16'])
).sort_index()
recency = df.groupby('group')['val'].apply(rwma)
print(recency)
结果:
index rwma
group
one 0 2016-12-04 NaN
1 2016-12-04 2.000000
2 2017-01-20 1.481481
3 2017-02-01 4.175503
4 2017-02-15 3.569762
5 2017-03-01 3.899694
two 0 2016-11-16 NaN
1 2016-12-04 12.000000
2 2017-01-02 12.518519
3 2017-01-20 13.049360
4 2017-01-20 10.251243
5 2017-03-01 9.039866
基于 Stephen 的回答,这里是一个工作版本:
def rwma(group):
# perform the ewma
kwargs = dict(ignore_na=False, span=27, min_periods=0, adjust=True)
result = group.resample('1D').mean().ewm(**kwargs).mean().shift()
result = result[group.index].reset_index()
# rename the result column so that the merge goes smoothly
result.rename(columns={result.columns[-1]: 'rwma'}, inplace=True)
return result
recency = df.groupby('group')['val'].apply(rwma)
print(recency)
输出:
index rwma
group
one 0 2016-12-04 NaN
1 2016-12-04 NaN
2 2017-01-20 1.500000
3 2017-02-01 8.776518
4 2017-02-15 4.016278
5 2017-03-01 4.670166
two 0 2016-11-16 NaN
1 2016-12-04 12.000000
2 2017-01-02 12.791492
3 2017-01-20 13.844843
4 2017-01-20 13.844843
5 2017-03-01 6.284914