pykalman的多元回归?
Multiple regression with pykalman?
我正在寻找一种方法,使用 pykalman 从 1 到 N
回归量来概括回归。我们最初不会为在线回归而烦恼——我只想要一个玩具示例来为 2 个回归器而不是 1 个回归器设置 Kalman 滤波器,即 Y = c1 * x1 + c2 * x2 + const
.
对于单个回归变量的情况,以下代码有效。我的问题是如何更改过滤器设置以使其适用于两个回归器:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pykalman import KalmanFilter
if __name__ == "__main__":
file_name = '<path>\KalmanExample.txt'
df = pd.read_csv(file_name, index_col = 0)
prices = df[['ETF', 'ASSET_1']] #, 'ASSET_2']]
delta = 1e-5
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.vstack( [prices['ETF'],
np.ones(prices['ETF'].shape)]).T[:, np.newaxis]
kf = KalmanFilter(
n_dim_obs=1,
n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov
)
state_means, state_covs = kf.filter(prices['ASSET_1'].values)
# Draw slope and intercept...
pd.DataFrame(
dict(
slope=state_means[:, 0],
intercept=state_means[:, 1]
), index=prices.index
).plot(subplots=True)
plt.show()
示例文件 KalmanExample.txt 包含以下数据:
Date,ETF,ASSET_1,ASSET_2
2007-01-02,176.5,136.5,141.0
2007-01-03,169.5,115.5,143.25
2007-01-04,160.5,111.75,143.5
2007-01-05,160.5,112.25,143.25
2007-01-08,161.0,112.0,142.5
2007-01-09,155.5,110.5,141.25
2007-01-10,156.5,112.75,141.25
2007-01-11,162.0,118.5,142.75
2007-01-12,161.5,117.0,142.5
2007-01-15,160.0,118.75,146.75
2007-01-16,156.5,119.5,146.75
2007-01-17,155.0,120.5,145.75
2007-01-18,154.5,124.5,144.0
2007-01-19,155.5,126.0,142.75
2007-01-22,157.5,124.5,142.5
2007-01-23,161.5,124.25,141.75
2007-01-24,164.5,125.25,142.75
2007-01-25,164.0,126.5,143.0
2007-01-26,161.5,128.5,143.0
2007-01-29,161.5,128.5,140.0
2007-01-30,161.5,129.75,139.25
2007-01-31,161.5,131.5,137.5
2007-02-01,164.0,130.0,137.0
2007-02-02,156.5,132.0,128.75
2007-02-05,156.0,131.5,132.0
2007-02-06,159.0,131.25,130.25
2007-02-07,159.5,136.25,131.5
2007-02-08,153.5,136.0,129.5
2007-02-09,154.5,138.75,128.5
2007-02-12,151.0,136.75,126.0
2007-02-13,151.5,139.5,126.75
2007-02-14,155.0,169.0,129.75
2007-02-15,153.0,169.5,129.75
2007-02-16,149.75,166.5,128.0
2007-02-19,150.0,168.5,130.0
单回归器案例提供以下输出,对于双回归器案例,我想要第二个“斜率”-plot 表示 C2
.
编辑答案以反映我对问题的修改理解。
如果我理解正确,您希望将可观察输出变量 Y = ETF
建模为两个可观察值的线性组合; ASSET_1, ASSET_2
。
这个回归的系数被当作系统状态,即ETF = x1*ASSET_1 + x2*ASSET_2 + x3
,其中x1
和x2
分别是资产1和2的系数,x3
是截距。假定这些系数 缓慢发展 .
下面给出了实现它的代码,请注意,这只是扩展现有示例以增加一个回归器。
另请注意,使用 delta
参数可能会得到完全不同的结果。如果将其设置得很大(远离零),则系数将变化得更快,并且 regressand 的重建将近乎完美。如果它设置得很小(非常接近于零),那么系数的演化会更慢,回归数据的重建也不会那么完美。您可能想查看 期望最大化 算法 - supported by pykalman.
代码:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pykalman import KalmanFilter
if __name__ == "__main__":
file_name = 'KalmanExample.txt'
df = pd.read_csv(file_name, index_col = 0)
prices = df[['ETF', 'ASSET_1', 'ASSET_2']]
delta = 1e-3
trans_cov = delta / (1 - delta) * np.eye(3)
obs_mat = np.vstack( [prices['ASSET_1'], prices['ASSET_2'],
np.ones(prices['ASSET_1'].shape)]).T[:, np.newaxis]
kf = KalmanFilter(
n_dim_obs=1,
n_dim_state=3,
initial_state_mean=np.zeros(3),
initial_state_covariance=np.ones((3, 3)),
transition_matrices=np.eye(3),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov
)
# state_means, state_covs = kf.em(prices['ETF'].values).smooth(prices['ETF'].values)
state_means, state_covs = kf.filter(prices['ETF'].values)
# Re-construct ETF from coefficients and 'ASSET_1' and ASSET_2 values:
ETF_est = np.array([a.dot(b) for a, b in zip(np.squeeze(obs_mat), state_means)])
# Draw slope and intercept...
pd.DataFrame(
dict(
slope1=state_means[:, 0],
slope2=state_means[:, 1],
intercept=state_means[:, 2],
), index=prices.index
).plot(subplots=True)
plt.show()
# Draw actual y, and estimated y:
pd.DataFrame(
dict(
ETF_est=ETF_est,
ETF_act=prices['ETF'].values
), index=prices.index
).plot()
plt.show()
情节:
我正在寻找一种方法,使用 pykalman 从 1 到 N
回归量来概括回归。我们最初不会为在线回归而烦恼——我只想要一个玩具示例来为 2 个回归器而不是 1 个回归器设置 Kalman 滤波器,即 Y = c1 * x1 + c2 * x2 + const
.
对于单个回归变量的情况,以下代码有效。我的问题是如何更改过滤器设置以使其适用于两个回归器:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pykalman import KalmanFilter
if __name__ == "__main__":
file_name = '<path>\KalmanExample.txt'
df = pd.read_csv(file_name, index_col = 0)
prices = df[['ETF', 'ASSET_1']] #, 'ASSET_2']]
delta = 1e-5
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.vstack( [prices['ETF'],
np.ones(prices['ETF'].shape)]).T[:, np.newaxis]
kf = KalmanFilter(
n_dim_obs=1,
n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov
)
state_means, state_covs = kf.filter(prices['ASSET_1'].values)
# Draw slope and intercept...
pd.DataFrame(
dict(
slope=state_means[:, 0],
intercept=state_means[:, 1]
), index=prices.index
).plot(subplots=True)
plt.show()
示例文件 KalmanExample.txt 包含以下数据:
Date,ETF,ASSET_1,ASSET_2
2007-01-02,176.5,136.5,141.0
2007-01-03,169.5,115.5,143.25
2007-01-04,160.5,111.75,143.5
2007-01-05,160.5,112.25,143.25
2007-01-08,161.0,112.0,142.5
2007-01-09,155.5,110.5,141.25
2007-01-10,156.5,112.75,141.25
2007-01-11,162.0,118.5,142.75
2007-01-12,161.5,117.0,142.5
2007-01-15,160.0,118.75,146.75
2007-01-16,156.5,119.5,146.75
2007-01-17,155.0,120.5,145.75
2007-01-18,154.5,124.5,144.0
2007-01-19,155.5,126.0,142.75
2007-01-22,157.5,124.5,142.5
2007-01-23,161.5,124.25,141.75
2007-01-24,164.5,125.25,142.75
2007-01-25,164.0,126.5,143.0
2007-01-26,161.5,128.5,143.0
2007-01-29,161.5,128.5,140.0
2007-01-30,161.5,129.75,139.25
2007-01-31,161.5,131.5,137.5
2007-02-01,164.0,130.0,137.0
2007-02-02,156.5,132.0,128.75
2007-02-05,156.0,131.5,132.0
2007-02-06,159.0,131.25,130.25
2007-02-07,159.5,136.25,131.5
2007-02-08,153.5,136.0,129.5
2007-02-09,154.5,138.75,128.5
2007-02-12,151.0,136.75,126.0
2007-02-13,151.5,139.5,126.75
2007-02-14,155.0,169.0,129.75
2007-02-15,153.0,169.5,129.75
2007-02-16,149.75,166.5,128.0
2007-02-19,150.0,168.5,130.0
单回归器案例提供以下输出,对于双回归器案例,我想要第二个“斜率”-plot 表示 C2
.
编辑答案以反映我对问题的修改理解。
如果我理解正确,您希望将可观察输出变量 Y = ETF
建模为两个可观察值的线性组合; ASSET_1, ASSET_2
。
这个回归的系数被当作系统状态,即ETF = x1*ASSET_1 + x2*ASSET_2 + x3
,其中x1
和x2
分别是资产1和2的系数,x3
是截距。假定这些系数 缓慢发展 .
下面给出了实现它的代码,请注意,这只是扩展现有示例以增加一个回归器。
另请注意,使用 delta
参数可能会得到完全不同的结果。如果将其设置得很大(远离零),则系数将变化得更快,并且 regressand 的重建将近乎完美。如果它设置得很小(非常接近于零),那么系数的演化会更慢,回归数据的重建也不会那么完美。您可能想查看 期望最大化 算法 - supported by pykalman.
代码:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pykalman import KalmanFilter
if __name__ == "__main__":
file_name = 'KalmanExample.txt'
df = pd.read_csv(file_name, index_col = 0)
prices = df[['ETF', 'ASSET_1', 'ASSET_2']]
delta = 1e-3
trans_cov = delta / (1 - delta) * np.eye(3)
obs_mat = np.vstack( [prices['ASSET_1'], prices['ASSET_2'],
np.ones(prices['ASSET_1'].shape)]).T[:, np.newaxis]
kf = KalmanFilter(
n_dim_obs=1,
n_dim_state=3,
initial_state_mean=np.zeros(3),
initial_state_covariance=np.ones((3, 3)),
transition_matrices=np.eye(3),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov
)
# state_means, state_covs = kf.em(prices['ETF'].values).smooth(prices['ETF'].values)
state_means, state_covs = kf.filter(prices['ETF'].values)
# Re-construct ETF from coefficients and 'ASSET_1' and ASSET_2 values:
ETF_est = np.array([a.dot(b) for a, b in zip(np.squeeze(obs_mat), state_means)])
# Draw slope and intercept...
pd.DataFrame(
dict(
slope1=state_means[:, 0],
slope2=state_means[:, 1],
intercept=state_means[:, 2],
), index=prices.index
).plot(subplots=True)
plt.show()
# Draw actual y, and estimated y:
pd.DataFrame(
dict(
ETF_est=ETF_est,
ETF_act=prices['ETF'].values
), index=prices.index
).plot()
plt.show()
情节: