Pandas

Question

假设我有如下数据框

+------------+-------+
|    Date    | Price |
+------------+-------+
| 25/08/2021 |    30 |
| 24/08/2021 |    20 |
| 23/08/2021 |    50 |
| 20/08/2021 |    10 |
| 19/08/2021 |    24 |
| 18/08/2021 |    23 |
| 17/08/2021 |    22 |
| 16/08/2021 |    10 |
+------------+-------+

上面的数据框可以使用下面的代码生成

data = {'Date':['2021-08-25', '2021-08-24', '2021-08-23', '2021-08-20',
                '2021-08-19', '2021-08-18', '2021-08-17', '2021-08-16'],
        'Price':[30, 20, 50, 10, 24, 23, 22, 10]}
df = pd.DataFrame(data)

我想基于标量 phi 动态创建列 weight。假设 phi = 0.95 t 的权重为 1-phi，即 weight 的 2021-08-25 的权重为 0.05。对于剩余日期，该值将为 W_t+1 * phi。因此对于日期 2021-08-24 weight 的值将是 0.05*0.95=0.0475

预期输出

+------------+-------+-------------+
|    Date    | Price |   Weight    |
+------------+-------+-------------+
| 2021-08-25 |    30 |        0.05 |
| 2021-08-24 |    20 |      0.0475 |
| 2021-08-23 |    50 |    0.045125 |
| 2021-08-20 |    10 |  0.04286875 |
| 2021-08-19 |    24 | 0.040725313 |
| 2021-08-18 |    23 | 0.038689047 |
| 2021-08-17 |    22 | 0.036754595 |
| 2021-08-16 |    10 | 0.034916865 |
+------------+-------+-------------+

动态创建列 weight 的矢量化方法是什么？

Answer 1

按照给出的示例输出值：

df['Weight'] = (1 - phi) * phi ** np.arange(len(df))

         Date  Price    Weight
0  2021-08-25     30  0.050000
1  2021-08-24     20  0.047500
2  2021-08-23     50  0.045125
3  2021-08-20     10  0.042869
4  2021-08-19     24  0.040725
5  2021-08-18     23  0.038689
6  2021-08-17     22  0.036755
7  2021-08-16     10  0.034917

（输出值四舍五入，这是Pandas的标准。）

Answer 2

使用numpy.logspace or numpy.geomspace构建几何级数：

import numpy as np
import pandas as pd

data = {'Date':['2021-08-25', '2021-08-24', '2021-08-23', '2021-08-20',
                '2021-08-19', '2021-08-18', '2021-08-17', '2021-08-16'],
        'Price':[30, 20, 50, 10, 24, 23, 22, 10]}
df = pd.DataFrame(data)

phi = 0.95
df['Weight'] = np.geomspace(1-phi, (1-phi)*phi**(len(df)-1), num=len(df))

print(df)
#          Date  Price    Weight
# 0  2021-08-25     30  0.050000
# 1  2021-08-24     20  0.047500
# 2  2021-08-23     50  0.045125
# 3  2021-08-20     10  0.042869
# 4  2021-08-19     24  0.040725
# 5  2021-08-18     23  0.038689
# 6  2021-08-17     22  0.036755
# 7  2021-08-16     10  0.034917

使用 numpy.logspace 代替 numpy.geomspace:

from math import log
start = log(1-phi, phi)
df['Weight'] = np.logspace(start, start+len(df)-1, num=len(df), base=phi)

Pandas - 根据条件动态创建列

Pandas - Create Column on the fly based on condition

python

calculated-columns

dataframe