Pandas - 根据条件动态创建列
Pandas - Create Column on the fly based on condition
假设我有如下数据框
+------------+-------+
| Date | Price |
+------------+-------+
| 25/08/2021 | 30 |
| 24/08/2021 | 20 |
| 23/08/2021 | 50 |
| 20/08/2021 | 10 |
| 19/08/2021 | 24 |
| 18/08/2021 | 23 |
| 17/08/2021 | 22 |
| 16/08/2021 | 10 |
+------------+-------+
上面的数据框可以使用下面的代码生成
data = {'Date':['2021-08-25', '2021-08-24', '2021-08-23', '2021-08-20',
'2021-08-19', '2021-08-18', '2021-08-17', '2021-08-16'],
'Price':[30, 20, 50, 10, 24, 23, 22, 10]}
df = pd.DataFrame(data)
我想基于标量 phi
动态创建列 weight
。
假设 phi = 0.95
t
的权重为 1-phi
,即 weight
的 2021-08-25
的权重为 0.05
。对于剩余日期,该值将为 W_t+1 * phi
。因此对于日期 2021-08-24
weight
的值将是 0.05*0.95=0.0475
预期输出
+------------+-------+-------------+
| Date | Price | Weight |
+------------+-------+-------------+
| 2021-08-25 | 30 | 0.05 |
| 2021-08-24 | 20 | 0.0475 |
| 2021-08-23 | 50 | 0.045125 |
| 2021-08-20 | 10 | 0.04286875 |
| 2021-08-19 | 24 | 0.040725313 |
| 2021-08-18 | 23 | 0.038689047 |
| 2021-08-17 | 22 | 0.036754595 |
| 2021-08-16 | 10 | 0.034916865 |
+------------+-------+-------------+
动态创建列 weight
的矢量化方法是什么?
按照给出的示例输出值:
df['Weight'] = (1 - phi) * phi ** np.arange(len(df))
Date Price Weight
0 2021-08-25 30 0.050000
1 2021-08-24 20 0.047500
2 2021-08-23 50 0.045125
3 2021-08-20 10 0.042869
4 2021-08-19 24 0.040725
5 2021-08-18 23 0.038689
6 2021-08-17 22 0.036755
7 2021-08-16 10 0.034917
(输出值四舍五入,这是Pandas的标准。)
使用numpy.logspace
or numpy.geomspace
构建几何级数:
import numpy as np
import pandas as pd
data = {'Date':['2021-08-25', '2021-08-24', '2021-08-23', '2021-08-20',
'2021-08-19', '2021-08-18', '2021-08-17', '2021-08-16'],
'Price':[30, 20, 50, 10, 24, 23, 22, 10]}
df = pd.DataFrame(data)
phi = 0.95
df['Weight'] = np.geomspace(1-phi, (1-phi)*phi**(len(df)-1), num=len(df))
print(df)
# Date Price Weight
# 0 2021-08-25 30 0.050000
# 1 2021-08-24 20 0.047500
# 2 2021-08-23 50 0.045125
# 3 2021-08-20 10 0.042869
# 4 2021-08-19 24 0.040725
# 5 2021-08-18 23 0.038689
# 6 2021-08-17 22 0.036755
# 7 2021-08-16 10 0.034917
使用 numpy.logspace
代替 numpy.geomspace
:
from math import log
start = log(1-phi, phi)
df['Weight'] = np.logspace(start, start+len(df)-1, num=len(df), base=phi)
假设我有如下数据框
+------------+-------+
| Date | Price |
+------------+-------+
| 25/08/2021 | 30 |
| 24/08/2021 | 20 |
| 23/08/2021 | 50 |
| 20/08/2021 | 10 |
| 19/08/2021 | 24 |
| 18/08/2021 | 23 |
| 17/08/2021 | 22 |
| 16/08/2021 | 10 |
+------------+-------+
上面的数据框可以使用下面的代码生成
data = {'Date':['2021-08-25', '2021-08-24', '2021-08-23', '2021-08-20',
'2021-08-19', '2021-08-18', '2021-08-17', '2021-08-16'],
'Price':[30, 20, 50, 10, 24, 23, 22, 10]}
df = pd.DataFrame(data)
我想基于标量 phi
动态创建列 weight
。
假设 phi = 0.95
t
的权重为 1-phi
,即 weight
的 2021-08-25
的权重为 0.05
。对于剩余日期,该值将为 W_t+1 * phi
。因此对于日期 2021-08-24
weight
的值将是 0.05*0.95=0.0475
预期输出
+------------+-------+-------------+
| Date | Price | Weight |
+------------+-------+-------------+
| 2021-08-25 | 30 | 0.05 |
| 2021-08-24 | 20 | 0.0475 |
| 2021-08-23 | 50 | 0.045125 |
| 2021-08-20 | 10 | 0.04286875 |
| 2021-08-19 | 24 | 0.040725313 |
| 2021-08-18 | 23 | 0.038689047 |
| 2021-08-17 | 22 | 0.036754595 |
| 2021-08-16 | 10 | 0.034916865 |
+------------+-------+-------------+
动态创建列 weight
的矢量化方法是什么?
按照给出的示例输出值:
df['Weight'] = (1 - phi) * phi ** np.arange(len(df))
Date Price Weight
0 2021-08-25 30 0.050000
1 2021-08-24 20 0.047500
2 2021-08-23 50 0.045125
3 2021-08-20 10 0.042869
4 2021-08-19 24 0.040725
5 2021-08-18 23 0.038689
6 2021-08-17 22 0.036755
7 2021-08-16 10 0.034917
(输出值四舍五入,这是Pandas的标准。)
使用numpy.logspace
or numpy.geomspace
构建几何级数:
import numpy as np
import pandas as pd
data = {'Date':['2021-08-25', '2021-08-24', '2021-08-23', '2021-08-20',
'2021-08-19', '2021-08-18', '2021-08-17', '2021-08-16'],
'Price':[30, 20, 50, 10, 24, 23, 22, 10]}
df = pd.DataFrame(data)
phi = 0.95
df['Weight'] = np.geomspace(1-phi, (1-phi)*phi**(len(df)-1), num=len(df))
print(df)
# Date Price Weight
# 0 2021-08-25 30 0.050000
# 1 2021-08-24 20 0.047500
# 2 2021-08-23 50 0.045125
# 3 2021-08-20 10 0.042869
# 4 2021-08-19 24 0.040725
# 5 2021-08-18 23 0.038689
# 6 2021-08-17 22 0.036755
# 7 2021-08-16 10 0.034917
使用 numpy.logspace
代替 numpy.geomspace
:
from math import log
start = log(1-phi, phi)
df['Weight'] = np.logspace(start, start+len(df)-1, num=len(df), base=phi)