Python 梯度下降多元回归 - 成本增加到无穷大

Question

为我的最后一年项目编写此算法。使用梯度下降法找到最小值，但成本却高达无穷大。

我检查了 gradientDescent 函数。我相信这是正确的。

我正在导入的 csv 及其格式导致了一些错误。 CSV 中的数据格式如下。

'|'之前的每个四边形是一行。

前3列是自变量x。第 4 列依赖于 y.

600 20 0.5 0.63 | 600 20 1 1.5 | 800 20 0.5 0.9

import numpy as np
import random
import pandas as pd

def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta

df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)

x = df.loc[:,'0':'2'].as_matrix()
y = df[3].as_matrix()

print(x)
print(y)

m, n = np.shape(x)
numIterations= 100
alpha = 0.001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

Answer 1

正如评论中提到的forayer，问题出在您读取csv的那一行。您正在设置 delimiter=","，这意味着 python 期望数据中的每一列以逗号分隔。但是，在您的数据中，列显然由空格分隔。

只需将此行替换为

df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=" ",header=None)

Python 梯度下降多元回归 - 成本增加到无穷大

Python gradient-descent multi-regression - cost increases to infinity

python

numpy

machine-learning

pandas

gradient-descent