为高斯过程回归优化超参数时的不同结果

Different results when optimizing hyperparameter for a Gaussian process regression

我正在研究高斯过程回归,我正在尝试使用 scikit-learn 的内置函数,并且还试图为此实现一个自定义函数。

这是使用scikit-learn时的代码:

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s

X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) # Function

v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized 
hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn 
Out[]: 14.1**2 * RBF(length_scale=3.7) + WhiteKernel(noise_level=1e-05) #result

这是我手动编写的代码:

def marglike(par,X,Y): #defining log-marginal-likelihood
# print(par)
l,var,sigma_n = par
n = len(X)
dist_X = (X - X.T)**2
# print(dist_X)
k = var*np.exp(-(1/(2*(l**2)))*dist_X)
inverse = np.linalg.inv(k + (sigma_n**2)*np.eye(len(k))) 
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k + 
(sigma_n**2)*np.eye(len(k)))) + (n/2)*np.log(2*np.pi)
return ml

b= [0.0005,100]
bnd = [b,b,b] #bounds used for "minimize" function
start = np.array([1.1,1.6,0.05]) #initial hyperparameters values
re = minimize(marglike,start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the 
method used is the same as the one used by scikit-learn
re.x #Hyperparameter results
Out[]: array([3.55266484e+00, 9.99986210e+01, 5.00000000e-04])

正如你所看到的,我从这两种方法中得到的超参数是不同的,但是我使用了相同的数据(X,Y)和相同的最小化方法。

有人可以帮助我了解为什么以及如何获得相同的结果吗?!

正如 San Mason 所建议的那样,添加噪声确实有效!否则,当您手动执行此操作时(在自定义代码中),将初始噪声设置为相当低并使用不同的初始化进行多次重启,然后您将获得接近的值。顺便说一句,无噪声数据似乎在超参数的 space 中创建了一个固定的脊(如 Surrogates GP 书中的图 1.6)。请注意,对于您的自定义函数,scikit-learn 噪声是 sigma_n^2。以下是有噪声和无噪声情况的片段。

无噪音机箱

scikit-学习

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s

X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) #+ np.random.normal(10)# Function

v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized 
# hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn 
# Out[]: 14.1**2 * RBF(length_scale=3.7) + WhiteKernel(noise_level=1e-05) #result

自定义函数

def marglike(par,X,Y): #defining log-marginal-likelihood
    # print(par)
    l,std,sigma_n = par
    n = len(X)
    dist_X = (X - X.T)**2
    # print(dist_X)
    k = std**2*np.exp(-(dist_X/(2*(l**2)))) + (sigma_n**2)*np.eye(n)
    inverse = np.linalg.inv(k) 
    ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k)) + (n/2)*np.log(2*np.pi)
    return ml[0,0]

b= [10**-5,10**5]
bnd = [b,b,b] #bounds used for "minimize" function
start = [1,1,10**-5] #initial hyperparameters values
re = minimize(fun=marglike,x0=start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the 
# method used is the same as the one used by scikit-learn
re.x[1], re.x[0], re.x[2]**2
# Output - (9.920690495739379, 3.5657912350017575, 1.0000000000000002e-10)

嘈杂案例

scikit-学习

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s

X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) + np.random.normal(size=10).reshape(10,1)*0.1 # Function

v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized 
# hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn 
# Out[]: 10.3**2 * RBF(length_scale=3.45) + WhiteKernel(noise_level=0.00792) #result

自定义函数

def marglike(par,X,Y): #defining log-marginal-likelihood
    # print(par)
    l,std,sigma_n = par
    n = len(X)
    dist_X = (X - X.T)**2
    # print(dist_X)
    k = std**2*np.exp(-(dist_X/(2*(l**2)))) + (sigma_n**2)*np.eye(n)
    inverse = np.linalg.inv(k) 
    ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k)) + (n/2)*np.log(2*np.pi)
    return ml[0,0]

b= [10**-5,10**5]
bnd = [b,b,b] #bounds used for "minimize" function
start = [1,1,10**-5] #initial hyperparameters values
re = minimize(fun=marglike,x0=start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the 
# method used is the same as the one used by scikit-learn
re.x[1], re.x[0], re.x[2]**2
# Output - (10.268943740577331, 3.4462604625225106, 0.007922681239535326)