如何在 scipy.optimize 中使用 fmin_cg 获得正确的维度

Question

我一直在尝试使用 fmin_cg 来最小化逻辑回归的成本函数。

xopt = fmin_cg(costFn, fprime=grad, x0= initial_theta, 
                                 args = (X, y, m), maxiter = 400, disp = True, full_output = True )

这就是我对 fmin_cg

的称呼

这是我的 CostFn：

def costFn(theta, X, y, m):
    h = sigmoid(X.dot(theta))
    J = 0
    J = 1 / m * np.sum((-(y * np.log(h))) - ((1-y) * np.log(1-h)))
    return J.flatten()

这是我的毕业作品：

def grad(theta, X, y, m):
    h = sigmoid(X.dot(theta))
    J = 1 / m * np.sum((-(y * np.log(h))) - ((1-y) * np.log(1-h)))
    gg = 1 / m * (X.T.dot(h-y))
    return gg.flatten()

似乎抛出了这个错误：

/Users/sugethakch/miniconda2/lib/python2.7/site-packages/scipy/optimize/linesearch.pyc in phi(s)
     85     def phi(s):
     86         fc[0] += 1
---> 87         return f(xk + s*pk, *args)
     88 
     89     def derphi(s):

ValueError: operands could not be broadcast together with shapes (3,) (300,)

我知道这与我的尺寸有关。但我似乎无法弄清楚。我是菜鸟，所以我可能犯了一个明显的错误。

我读过这个link:

但是，它似乎对我不起作用。

有什么帮助吗？

X、y、m、theta 的更新大小

(100, 3) ----> X

(100, 1) -----> y

100 ---->米

(3, 1) ----> θ

这是我初始化 X,y,m 的方式：

data = pd.read_csv('ex2data1.txt', sep=",", header=None)                        
data.columns = ['x1', 'x2', 'y']                                                       
x1 = data.iloc[:, 0].values[:, None]                                                     
x2 = data.iloc[:, 1].values[:, None]                                                    
y = data.iloc[:, 2].values[:, None]
# join x1 and x2 to make one array of X
X = np.concatenate((x1, x2), axis=1)
m, n = X.shape

ex2data1.txt:

34.62365962451697,78.0246928153624,0
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
.....

如果有帮助，我正在尝试重新编码 Andrew Ng 在 python

中为 Coursera 的 ML 课程布置的作业之一

Answer 1

嗯，由于我不确切知道您如何初始化 m、X、y 和 theta，所以我不得不做出一些假设。希望我的回答是相关的：

import numpy as np
from scipy.optimize import fmin_cg
from scipy.special import expit

def costFn(theta, X, y, m):
    # expit is the same as sigmoid, but faster
    h = expit(X.dot(theta))

    # instead of 1/m, I take the mean
    J =  np.mean((-(y * np.log(h))) - ((1-y) * np.log(1-h)))
    return J #should be a scalar


def grad(theta, X, y, m):
    h = expit(X.dot(theta))
    J =  np.mean((-(y * np.log(h))) - ((1-y) * np.log(1-h)))
    gg =  (X.T.dot(h-y))    
    return gg.flatten()

# initialize matrices
X = np.random.randn(100,3)
y = np.random.randn(100,) #this apparently needs to be a 1-d vector
m = np.ones((3,)) # not using m, used np.mean for a weighted sum (see ali_m's comment)
theta = np.ones((3,1))

xopt = fmin_cg(costFn, fprime=grad, x0=theta, args=(X, y, m), maxiter=400, disp=True, full_output=True )

虽然代码正在运行，但我对您的问题了解不多，无法确定这是否是您要查找的问题。但希望这可以帮助您更好地理解问题。检查答案的一种方法是调用 fmin_cg 和 fprime=None 并查看答案如何比较。

Answer 2

最后，我弄清楚了我最初程序中的问题所在。

我的 'y' 是 (100, 1) 而 fmin_cg 期望 (100, )。一旦我将 'y' 展平，它就不再抛出初始错误。但是，优化仍然没有奏效。

 Warning: Desired error not necessarily achieved due to precision loss.
     Current function value: 0.693147
     Iterations: 0
     Function evaluations: 43
     Gradient evaluations: 41

这和我没有优化的结果一样。

我发现优化它的方法是使用 'Nelder-Mead' 方法。我遵循了这个答案：scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"

Result = op.minimize(fun = costFn, 
                x0 = initial_theta, 
                args = (X, y, m),
                method = 'Nelder-Mead',
                options={'disp': True})#,
                #jac = grad)

此方法不需要 'jacobian'。我得到了我想要的结果，

Optimization terminated successfully.
     Current function value: 0.203498
     Iterations: 157
     Function evaluations: 287

如何在 scipy.optimize 中使用 fmin_cg 获得正确的维度

How to get dimensions right using fmin_cg in scipy.optimize

python

optimization

gradient

machine-learning

scipy