如何在 scikit 学习中使用 GP.fit 进行多维输入?
How do I use GP.fit in sci-kit learn for a multi-dimensional input?
能举个例子吗?我正在尝试将其用于 5D 输入。另外,我如何为每个输入和输出绘制图表。我有一个输出维度。我的想法是传递一些训练集数据,然后根据测试数据集验证输出。
我想传递一个 5d(X1 X2 X3 X4 X5 输入,我有 1600 个数据点。现在我只有 X1 作为输入
代码如下:
from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.base import BaseEstimator
from sklearn.gaussian_process.kernels import RBF, Matern, WhiteKernel, ConstantKernel, RationalQuadratic, ExpSineSquared, DotProduct
# define Kernel
import numpy as np
kernels = [1.0 * RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
1.0 * Matern(length_scale=1.0, length_scale_bounds=(1e-1, 10.0),
nu=1.5),
1.0 * RationalQuadratic(length_scale=1.0, alpha=0.1),
1.0 * ExpSineSquared(length_scale=1.0, periodicity=3.0,
length_scale_bounds=(0.1, 10.0),
periodicity_bounds=(1.0, 10.0)),
ConstantKernel(0.1, (0.01, 10.0))
* (DotProduct(sigma_0=1.0, sigma_0_bounds=(0.0, 10.0)) ** 2),
]
# Define inputs and outputs
x = np.array([-5.2,-3,-2,-1,1,5], ndmin=2).T
X = x.reshape(-1, 1)
y =np.array([-2,0,1,2,-1,1])
max_x = max(x)
min_x = min (x)
max_y = max (y)
min_y = min(y)
for fig_index, kernel in enumerate(kernels):
# call GP regression library and fit inputs to output
gp = gaussian_process.GaussianProcessRegressor(kernel=kernel)
gp.fit(X, y)
# parameter = get_params(deep=True)
# print(parameter)
gp.kernel_
print(gp.kernel_)
plt.figure(fig_index, figsize=(10,6))
plt.subplot(2,1,1)
x_pred = np.array(np.linspace(-5, 5,50), ndmin=2).T
# Mark the observations
plt.plot(X, y, 'ro', label='observations')
X_test = np.array(np.linspace(max_x+1, min_x-1, 1000),ndmin=2).T
y_mean, y_std = gp.predict(X_test, return_std=True)
# Draw a mean function and 95% confidence interval
plt.plot(X_test, y_mean, 'b-', label='mean function')
upper_bound = y_mean +y_std
lower_bound = y_mean - y_std
plt.fill_between(X_test.ravel(), lower_bound, upper_bound, color = 'k', alpha = 0.2,
label='95% confidence interval')
# plot posterior
y_sample = gp.sample_y(X_test,4)
plt.plot(X_test,y_sample,lw=1)
plt.scatter(X[:,0],y,c='r',s=50,zorder=10,edgecolor=(0,0,0))
plt.title("Posterior (kernel:%s)\n Log-Likelihood: %3f"
% (gp.kernel_, gp.log_marginal_likelihood(gp.kernel_.theta)),
fontsize=14)
plt.tight_layout()
plt.show()
使用多个输入进行 GP 回归没有什么特别之处,除了对于各向异性的情况,您必须在内核定义中明确提供相关参数。
这是一个简单的虚拟 5D 数据示例,如您的数据和各向同性 RBF 内核:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.datasets import make_regression
import numpy as np
# dummy data:
X, y = make_regression(n_samples=20, n_features=5, n_targets=1)
X.shape
# (20, 5)
kernel = RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0))
gp = GaussianProcessRegressor(kernel=kernel)
gp.fit(X, y)
# GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
# kernel=RBF((length_scale=1), n_restarts_optimizer=0,
# normalize_y=False, optimizer='fmin_l_bfgs_b',
# random_state=None)
更新:在各向异性情况下,您应该在内核中明确定义不同的参数;这是 RBF 内核和 2D 变量的示例定义:
kernel = RBF(length_scale=[1.0, 2.0], length_scale_bounds=[(1e-1, 10.0), (1e-2, 1.0)])
对 5D 情况进行类似扩展。
能举个例子吗?我正在尝试将其用于 5D 输入。另外,我如何为每个输入和输出绘制图表。我有一个输出维度。我的想法是传递一些训练集数据,然后根据测试数据集验证输出。 我想传递一个 5d(X1 X2 X3 X4 X5 输入,我有 1600 个数据点。现在我只有 X1 作为输入
代码如下:
from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.base import BaseEstimator
from sklearn.gaussian_process.kernels import RBF, Matern, WhiteKernel, ConstantKernel, RationalQuadratic, ExpSineSquared, DotProduct
# define Kernel
import numpy as np
kernels = [1.0 * RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
1.0 * Matern(length_scale=1.0, length_scale_bounds=(1e-1, 10.0),
nu=1.5),
1.0 * RationalQuadratic(length_scale=1.0, alpha=0.1),
1.0 * ExpSineSquared(length_scale=1.0, periodicity=3.0,
length_scale_bounds=(0.1, 10.0),
periodicity_bounds=(1.0, 10.0)),
ConstantKernel(0.1, (0.01, 10.0))
* (DotProduct(sigma_0=1.0, sigma_0_bounds=(0.0, 10.0)) ** 2),
]
# Define inputs and outputs
x = np.array([-5.2,-3,-2,-1,1,5], ndmin=2).T
X = x.reshape(-1, 1)
y =np.array([-2,0,1,2,-1,1])
max_x = max(x)
min_x = min (x)
max_y = max (y)
min_y = min(y)
for fig_index, kernel in enumerate(kernels):
# call GP regression library and fit inputs to output
gp = gaussian_process.GaussianProcessRegressor(kernel=kernel)
gp.fit(X, y)
# parameter = get_params(deep=True)
# print(parameter)
gp.kernel_
print(gp.kernel_)
plt.figure(fig_index, figsize=(10,6))
plt.subplot(2,1,1)
x_pred = np.array(np.linspace(-5, 5,50), ndmin=2).T
# Mark the observations
plt.plot(X, y, 'ro', label='observations')
X_test = np.array(np.linspace(max_x+1, min_x-1, 1000),ndmin=2).T
y_mean, y_std = gp.predict(X_test, return_std=True)
# Draw a mean function and 95% confidence interval
plt.plot(X_test, y_mean, 'b-', label='mean function')
upper_bound = y_mean +y_std
lower_bound = y_mean - y_std
plt.fill_between(X_test.ravel(), lower_bound, upper_bound, color = 'k', alpha = 0.2,
label='95% confidence interval')
# plot posterior
y_sample = gp.sample_y(X_test,4)
plt.plot(X_test,y_sample,lw=1)
plt.scatter(X[:,0],y,c='r',s=50,zorder=10,edgecolor=(0,0,0))
plt.title("Posterior (kernel:%s)\n Log-Likelihood: %3f"
% (gp.kernel_, gp.log_marginal_likelihood(gp.kernel_.theta)),
fontsize=14)
plt.tight_layout()
plt.show()
使用多个输入进行 GP 回归没有什么特别之处,除了对于各向异性的情况,您必须在内核定义中明确提供相关参数。
这是一个简单的虚拟 5D 数据示例,如您的数据和各向同性 RBF 内核:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.datasets import make_regression
import numpy as np
# dummy data:
X, y = make_regression(n_samples=20, n_features=5, n_targets=1)
X.shape
# (20, 5)
kernel = RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0))
gp = GaussianProcessRegressor(kernel=kernel)
gp.fit(X, y)
# GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
# kernel=RBF((length_scale=1), n_restarts_optimizer=0,
# normalize_y=False, optimizer='fmin_l_bfgs_b',
# random_state=None)
更新:在各向异性情况下,您应该在内核中明确定义不同的参数;这是 RBF 内核和 2D 变量的示例定义:
kernel = RBF(length_scale=[1.0, 2.0], length_scale_bounds=[(1e-1, 10.0), (1e-2, 1.0)])
对 5D 情况进行类似扩展。