使用 GPflow 进行多维高斯过程回归

Question

我想使用在使用版本 2 的 GPflow 中实现的高斯过程回归来执行一些多元回归。安装 pip install gpflow==2.0.0rc1

下面是一些示例代码，它生成一些二维数据，然后尝试使用 GPR 对其进行拟合，最后计算差异在真实输入数据和 GPR 预测之间。

最终我想扩展到更高的维度并针对验证集进行测试以检查是否过度拟合并试验其他内核和 "Automatic Relevance Determination" 但了解如何让它发挥作用是第一步。

谢谢！

以下代码片段将在 jupyter notebook 中运行。

import gpflow
import numpy as np
import matplotlib
from gpflow.utilities import print_summary

%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def gen_data(X, Y):
    """
    make some fake data.
    X, Y are np.ndarrays with shape (N,) where
    N is the number of samples.
    """

    ys = []
    for x0, x1 in zip(X,Y):
        y = x0 * np.sin(x0*10)
        y = x1 * np.sin(x0*10)
        y += 1
        ys.append(y)
    return np.array(ys)


# generate some fake data
x = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, x)

X = X.ravel()
Y = Y.ravel()

z = gen_data(X, Y)

#note X.shape, Y.shape and z.shape
#are all (400,) for this case.

# if you would like to plot the data you can do the following
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X, Y, z, s=100, c='k')


# had to set this 
# to avoid the following error
# tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky]
gpflow.config.set_default_positive_minimum(1e-7)

# setup the kernel

k = gpflow.kernels.Matern52()


# set up GPR model

# I think the shape of the independent data
# should be (400, 2) for this case
XY = np.column_stack([[X, Y]]).T
print(XY.shape) # this will be (400, 2)

m = gpflow.models.GPR(data=(XY, z), kernel=k, mean_function=None)

# optimise hyper-parameters
opt = gpflow.optimizers.Scipy()

def objective_closure():
    return - m.log_marginal_likelihood()

opt_logs = opt.minimize(objective_closure,
                        m.trainable_variables,
                        options=dict(maxiter=100)
                       )


# predict training set
mean, var = m.predict_f(XY)

print(mean.numpy().shape)
# (400, 400)
# I would expect this to be (400,)

# If it was then I could compute the difference
# between the true data and the GPR prediction
# `diff = mean - z`
# but because the shape is not as expected this of course
# won't work.

Answer 1

z 的形状必须是 (N, 1)，而你的情况是 (N,)。但是，这是 GPflow 中缺少的检查，而不是你的错。

使用 GPflow 进行多维高斯过程回归

gaussian process regression in multiple dimensions with GPflow

gpflow