如何在 scikit 学习中使用 GP.fit 进行多维输入?

How do I use GP.fit in sci-kit learn for a multi-dimensional input?

能举个例子吗?我正在尝试将其用于 5D 输入。另外,我如何为每个输入和输出绘制图表。我有一个输出维度。我的想法是传递一些训练集数据,然后根据测试数据集验证输出。 我想传递一个 5d(X1 X2 X3 X4 X5 输入,我有 1600 个数据点。现在我只有 X1 作为输入

代码如下:

from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.base import BaseEstimator
from sklearn.gaussian_process.kernels import RBF, Matern, WhiteKernel, ConstantKernel, RationalQuadratic, ExpSineSquared, DotProduct
# define Kernel

import numpy as np
kernels = [1.0 * RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
           1.0 * Matern(length_scale=1.0, length_scale_bounds=(1e-1, 10.0),
                        nu=1.5),
           1.0 * RationalQuadratic(length_scale=1.0, alpha=0.1),
           1.0 * ExpSineSquared(length_scale=1.0, periodicity=3.0,
                                length_scale_bounds=(0.1, 10.0),
                                periodicity_bounds=(1.0, 10.0)),
           ConstantKernel(0.1, (0.01, 10.0))
               * (DotProduct(sigma_0=1.0, sigma_0_bounds=(0.0, 10.0)) ** 2),
           ]

# Define inputs and outputs
x = np.array([-5.2,-3,-2,-1,1,5], ndmin=2).T
X = x.reshape(-1, 1)
y =np.array([-2,0,1,2,-1,1])
max_x = max(x)
min_x = min (x)
max_y = max (y)
min_y = min(y)

for fig_index, kernel in enumerate(kernels):
    # call GP regression library and fit inputs to output
    gp = gaussian_process.GaussianProcessRegressor(kernel=kernel)
    gp.fit(X, y)
#     parameter = get_params(deep=True)
#     print(parameter)           

    gp.kernel_
    print(gp.kernel_)
    plt.figure(fig_index, figsize=(10,6))
    plt.subplot(2,1,1)
    x_pred = np.array(np.linspace(-5, 5,50), ndmin=2).T

    # Mark the observations
    plt.plot(X, y, 'ro', label='observations')

    X_test = np.array(np.linspace(max_x+1, min_x-1, 1000),ndmin=2).T
    y_mean, y_std = gp.predict(X_test, return_std=True)
    # Draw a mean function and 95% confidence interval
    plt.plot(X_test, y_mean, 'b-', label='mean function')
    upper_bound = y_mean +y_std
    lower_bound = y_mean - y_std
    plt.fill_between(X_test.ravel(), lower_bound, upper_bound, color = 'k', alpha = 0.2,
                 label='95% confidence interval')

    # plot posterior
    y_sample = gp.sample_y(X_test,4)
    plt.plot(X_test,y_sample,lw=1)
    plt.scatter(X[:,0],y,c='r',s=50,zorder=10,edgecolor=(0,0,0))
    plt.title("Posterior (kernel:%s)\n Log-Likelihood: %3f"
             % (gp.kernel_, gp.log_marginal_likelihood(gp.kernel_.theta)),
              fontsize=14)
    plt.tight_layout()
    plt.show()

使用多个输入进行 GP 回归没有什么特别之处,除了对于各向异性的情况,您必须在内核定义中明确提供相关参数。

这是一个简单的虚拟 5D 数据示例,如您的数据和各向同性 RBF 内核:

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.datasets import make_regression
import numpy as np

# dummy data:
X, y = make_regression(n_samples=20, n_features=5, n_targets=1)
X.shape
# (20, 5)

kernel = RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0))
gp = GaussianProcessRegressor(kernel=kernel)
gp.fit(X, y)
# GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
#             kernel=RBF((length_scale=1), n_restarts_optimizer=0,
#             normalize_y=False, optimizer='fmin_l_bfgs_b',
#             random_state=None)

更新:在各向异性情况下,您应该在内核中明确定义不同的参数;这是 RBF 内核和 2D 变量的示例定义:

kernel = RBF(length_scale=[1.0, 2.0], length_scale_bounds=[(1e-1, 10.0), (1e-2, 1.0)])

对 5D 情况进行类似扩展。