使用 Scipy 实现逻辑回归:为什么这个 Scipy 优化 return 全零?
Implementing Logistic Regression with Scipy: Why does this Scipy optimization return all zeros?
我正在尝试像 Andrew Ng machine learning class, He uses an octave function called fmincg
in his implementation. I have tried to use several functions in the scipy.optimize.minimize
那样实现一对多逻辑回归,但无论我输入什么,我总是在分类器输出中得到全零。
在过去的几个小时里,我查阅了大量资源,但最有帮助的是 this stack overflow post, and this blog post。
我的实现是否有任何明显或不那么明显的误入歧途的地方?
import scipy.optimize as op
def sigmoid(z):
"""takes matrix and returns result of passing through sigmoid function"""
return 1.0 / (1.0 + np.exp(-z))
def lrCostFunction(theta, X, y, lam=0):
"""
evaluates logistic regression cost function:
theta: coefficients. (n x 1 array)
X: data matrix (m x n array)
y: ground truth matrix (m x 1 array)
lam: regularization constant
"""
m = len(y)
theta = theta.reshape((-1,1))
theta = np.nan_to_num(theta)
hypothesis = sigmoid(np.dot(X, theta))
term1 = np.dot(-y.T,np.log(hypothesis))
term2 = np.dot(1-y.T,np.log(1-hypothesis))
J = (1/m) * term1 - term2
J = J + (lam/(2*m))*np.sum(theta[1:]**2)
return J
def Gradient(theta, X, y, lam=0):
m = len(y)
theta = theta.reshape((-1,1))
hypothesis = sigmoid(np.dot(X, theta))
residuals = hypothesis - y
reg = theta
reg[0,:] = 0
reg[1:,:] = (lam/m)*theta[1:]
grad = (1.0/m)*np.dot(X.T,residuals)
grad = grad + reg
return grad.flatten()
def trainOneVersusAll(X, y, labels, lam=0):
"""
trains one vs all logistic regression.
inputs:
- X and Y should ndarrays with a row for each item in the training set
- labels is a list of labels to generate probabilities for.
- lam is a regularization constant
outputs:
- "all_theta", shape = (len(labels), n + 1)
"""
y = y.reshape((len(y), 1))
m, n = np.shape(X)
X = np.hstack((np.ones((m, 1)), X))
all_theta = np.zeros((len(labels), n + 1))
for i,c in enumerate(labels):
initial_theta = np.zeros(n+1)
result, _, _ = op.fmin_tnc(func=lrCostFunction,
fprime=Gradient,
x0=initial_theta,
args=(X, y==c, lam))
print result
all_theta[i,:] = result
return all_theta
def predictOneVsAll(all_theta, X):
pass
a = np.array([[ 5., 5., 6.],[ 6., 0., 8.],[ 1., 1., 1.], [ 6., 1., 9.]])
k = np.array([1,1,0,0])
# a = np.array([[1,0,1],[0,1,0]])
# k = np.array([0,1])
solution = np.linalg.lstsq(a,k)
print 'x', solution[0]
print 'resid', solution[1]
thetas = trainOneVersusAll(a, k, np.unique(k))
问题出在你的Gradient
功能上。 numpy
中的赋值是 不复制对象 ,所以你的行
reg = theta
使 reg
成为对 theta
的引用,因此每次计算梯度时,您实际上都会修改当前的解决方案。应该是
reg = theta.copy()
我还建议从随机权重开始
initial_theta = np.random.randn(n+1)
现在解决方案不再是零(虽然我没有检查每个公式,所以仍然可能存在数学错误)。同样值得注意的是,对于线性可分问题,没有正则化的逻辑回归是不适定的(它的 objective 是无界的),所以我建议用 lam>0
.
进行测试
我正在尝试像 Andrew Ng machine learning class, He uses an octave function called fmincg
in his implementation. I have tried to use several functions in the scipy.optimize.minimize
那样实现一对多逻辑回归,但无论我输入什么,我总是在分类器输出中得到全零。
在过去的几个小时里,我查阅了大量资源,但最有帮助的是 this stack overflow post, and this blog post。
我的实现是否有任何明显或不那么明显的误入歧途的地方?
import scipy.optimize as op
def sigmoid(z):
"""takes matrix and returns result of passing through sigmoid function"""
return 1.0 / (1.0 + np.exp(-z))
def lrCostFunction(theta, X, y, lam=0):
"""
evaluates logistic regression cost function:
theta: coefficients. (n x 1 array)
X: data matrix (m x n array)
y: ground truth matrix (m x 1 array)
lam: regularization constant
"""
m = len(y)
theta = theta.reshape((-1,1))
theta = np.nan_to_num(theta)
hypothesis = sigmoid(np.dot(X, theta))
term1 = np.dot(-y.T,np.log(hypothesis))
term2 = np.dot(1-y.T,np.log(1-hypothesis))
J = (1/m) * term1 - term2
J = J + (lam/(2*m))*np.sum(theta[1:]**2)
return J
def Gradient(theta, X, y, lam=0):
m = len(y)
theta = theta.reshape((-1,1))
hypothesis = sigmoid(np.dot(X, theta))
residuals = hypothesis - y
reg = theta
reg[0,:] = 0
reg[1:,:] = (lam/m)*theta[1:]
grad = (1.0/m)*np.dot(X.T,residuals)
grad = grad + reg
return grad.flatten()
def trainOneVersusAll(X, y, labels, lam=0):
"""
trains one vs all logistic regression.
inputs:
- X and Y should ndarrays with a row for each item in the training set
- labels is a list of labels to generate probabilities for.
- lam is a regularization constant
outputs:
- "all_theta", shape = (len(labels), n + 1)
"""
y = y.reshape((len(y), 1))
m, n = np.shape(X)
X = np.hstack((np.ones((m, 1)), X))
all_theta = np.zeros((len(labels), n + 1))
for i,c in enumerate(labels):
initial_theta = np.zeros(n+1)
result, _, _ = op.fmin_tnc(func=lrCostFunction,
fprime=Gradient,
x0=initial_theta,
args=(X, y==c, lam))
print result
all_theta[i,:] = result
return all_theta
def predictOneVsAll(all_theta, X):
pass
a = np.array([[ 5., 5., 6.],[ 6., 0., 8.],[ 1., 1., 1.], [ 6., 1., 9.]])
k = np.array([1,1,0,0])
# a = np.array([[1,0,1],[0,1,0]])
# k = np.array([0,1])
solution = np.linalg.lstsq(a,k)
print 'x', solution[0]
print 'resid', solution[1]
thetas = trainOneVersusAll(a, k, np.unique(k))
问题出在你的Gradient
功能上。 numpy
中的赋值是 不复制对象 ,所以你的行
reg = theta
使 reg
成为对 theta
的引用,因此每次计算梯度时,您实际上都会修改当前的解决方案。应该是
reg = theta.copy()
我还建议从随机权重开始
initial_theta = np.random.randn(n+1)
现在解决方案不再是零(虽然我没有检查每个公式,所以仍然可能存在数学错误)。同样值得注意的是,对于线性可分问题,没有正则化的逻辑回归是不适定的(它的 objective 是无界的),所以我建议用 lam>0
.