几个月前对数据框应用权重

Apply a weight to a dataframe by months back

使用 pandas 版本:

1.3.2

我有一个 DataFrameDate indexed 并且有几个 columns 像这样:

                 XLY        UA
Date
2017-04-01  0.023991  0.060656
2017-05-01  0.010993 -0.081401
2017-06-01 -0.015596  0.130679
2017-07-01  0.019302 -0.101686
2017-08-01 -0.018608 -0.166207
2017-09-01  0.004684 -0.005298
2017-10-01  0.021203 -0.232357
2017-11-01  0.050658  0.034692
2017-12-01  0.021107  0.116513
2018-01-01  0.092411 -0.035285
2018-02-01 -0.034691  0.171206
...
2022-03-01  0.079468  0.039667

我有一个 python dictionary 的权重,其中 tuple (1,12) 表示 1 到 12 个月前,应用该权重,依此类推。

weights = {(0,12): 6, (13,24) : 5, (25,36): 4, (37,48): 3, (49,60): 2, (61, 72): 1, (72,83): 1}

有没有办法将这些权重 apply 分配给 Dataframe 的每个 row,例如,row 2022-03-01 将是 0.079468 * 6.039667 * 6 等等,对于从今天起 12 个月后的所有行,当它到达 2021-03-01 时,它将应用 5 *,等等

indexDatetime,不是 string

我知道我可以 loop 做到这一点。我正在寻找 functional 简洁版。

**Edit 1**

 bins = pd.to_datetime([today] + [today - pd.DateOffset(months = k[1]) for k in weights])

报错:

Traceback (most recent call last):
  File "correlations.py", line 131, in <module>
    bins = pd.to_datetime([today] + [today - pd.DateOffset(k[1]) for k in weights])
 
 File "correlations.py", line 131, in <module>
bins = pd.to_datetime([today] + [today - pd.DateOffset(months = k[1]) for k in weights])


File "correlations.py", line 131, in <listcomp>
    bins = pd.to_datetime([today] + [today - pd.DateOffset(months = k[1]) for k in weights])
TypeError: 'int' object is not subscriptable
 idf:~/environments/python/yahoo_data$

尝试:

# Your dict
weights = {(0, 12): 6, (13, 24) : 5, (25, 36): 4, (37, 48): 3,
           (49,60): 2, (61, 72): 1, (72,83): 1}

# Today
today = pd.to_datetime('today').date()

# Create bins from today
bins= pd.to_datetime([today] + [today - pd.DateOffset(months=k[1]) for k in weights])

# bins must increase monotonically -> use [::-1] to reverse the list and labels
w = pd.cut(df.index, bins=bins[::-1], labels=list(weights.keys())[::-1]).map(weights)

# Answer from 
cols = ['XLY', 'UA']
df[cols] = df[cols].mul(w, axis=0)

输出:

>>> df
                 XLY        UA
Date                          
2017-04-01  0.047982  0.121312
2017-05-01  0.021986 -0.162802
2017-06-01 -0.031192  0.261358
2017-07-01  0.038604 -0.203372
2017-08-01 -0.037216 -0.332414
2017-09-01  0.009368 -0.010596
2017-10-01  0.042406 -0.464714
2017-11-01  0.101316  0.069384
2017-12-01  0.042214  0.233026
2018-01-01  0.184822 -0.070570
2018-02-01 -0.069382  0.342412
2022-03-01  0.476808  0.238002

>>> bins
DatetimeIndex(['2022-03-30', '2021-03-30', '2020-03-30', '2019-03-30',
               '2018-03-30', '2017-03-30', '2016-03-30', '2015-04-30'],
              dtype='datetime64[ns]', freq=None)

>>> w
Int64Index([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 6], dtype='int64')

这仍然使用检查所有键以确定正确权重的功能。

仅考虑使用 (now - row.name).days//30 的完整月份,其中 row.name 是索引日期。如果月份定义不同,请更改。

def get_weight(month):
    weights = {(0,12): 6, (13,24) : 5, (25,36): 4, (37,48): 3, (49,60): 2, (61, 72): 1, (72,83): 1}
    for m, w in weights.items():
        if month in range(m[0], m[1]+1):
            return w

now = datetime.datetime.now()
df.apply(lambda row: row * get_weight((now - row.name).days//30), axis=1)

或者可以使用初始权重为独特的月份差异创建地图(以避免重复计算):


all_monthes = (now - df.index).days//30

unique_months = all_months.unique()
weight_map = {}
for month in unique_months:
    weight_map[month] = get_weight(month)

df[['XLY', 'UA']] = df[['XLY', 'UA']].mul(all_months.map(weight_map), axis=0)

>>> df

              XLY          UA
Date        
2017-04-01  0.047982    0.121312
2017-05-01  0.021986    -0.162802
2017-06-01  -0.031192   0.261358
2017-07-01  0.038604    -0.203372
2017-08-01  -0.037216   -0.332414
2017-09-01  0.009368    -0.010596
2017-10-01  0.042406    -0.464714
2017-11-01  0.101316    0.069384
2017-12-01  0.042214    0.233026
2018-01-01  0.184822    -0.070570
2018-02-01  -0.069382   0.342412
2022-03-01  0.476808    0.238002

>>> weight_map
{60: 2, 59: 2, 58: 2, 57: 2, 56: 2, 55: 2, 54: 2, 53: 2, 52: 2, 51: 2, 50: 2, 1: 6}

>>> all_months
[60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 1]

>>> output

              XLY          UA
Date        
2017-04-01  0.095964    0.242624
2017-05-01  0.043972    -0.325604
2017-06-01  -0.062384   0.522716
2017-07-01  0.077208    -0.406744
2017-08-01  -0.074432   -0.664828
2017-09-01  0.018736    -0.021192
2017-10-01  0.084812    -0.929428
2017-11-01  0.202632    0.138768
2017-12-01  0.084428    0.466052
2018-01-01  0.369644    -0.141140
2018-02-01  -0.138764   0.684824
2022-03-01  2.860848    1.428012