使用梯度下降的多元线性回归
Multivariate Linear Regression using gradient descent
我正在使用梯度下降学习多元线性回归。我在下面写了 python 代码:
import pandas as pd
import numpy as np
x1 = np.array([1,2,3,4,5,6,7,8,9,10],dtype='float64')
x2 = np.array([5,10,20,40,80,160,320,640,1280,2560],dtype='float64')
y = np.array([350,700,1300,2400,4500,8600,16700,32800,64900,129000],dtype='float64')
def multivar_gradient_descent(x1,x2,y):
w1=w2=w0=0
iteration=500
n=len(x1)
learning_rate=0.02
for i in range(iteration):
y_predicted = w1 * x1 + w2 * x2 +w0
cost = (1*(2/n))*float(sum((y_predicted-y)**2)) # cost function
x1d = sum(x1*(y_predicted-y))/n # derivative for feature x1
x2d = sum(x2*(y_predicted-y))/n # derivative for feature x2
cd = sum(1*(y-y_predicted))/n # derivative for bias
w1 = w1 - learning_rate * x1d
w2 = w2 - learning_rate * x2d
w0 = w0 - learning_rate * cd
print(f"Iteration {i}: a= {w1}, b = {w2}, c = {w0}, cost = {cost} ")
return w1,w2, w0
w1,w2,w0 = multivar_gradient_descent(x1,x2,y)
w1,w2,w0
然而,结果是成本函数越来越高,直到变成 inf(如下所示)。我花了几个小时检查导数和成本函数的公式,但我无法确定错误在哪里。
我感到很沮丧,希望有人能帮我解决这个问题。谢谢。
Iteration 0: a= 4685.5, b = 883029.5, c = -522.5, cost = 4462002500.0
Iteration 1: a= -81383008.375, b = -15430704757.735, c = 9032851.74, cost = 1.3626144151911089e+18
Iteration 2: a= 1422228350500.3176, b = 269662832866446.66, c = -157855848816.2755, cost = 4.161440004246925e+26
Iteration 3: a= -2.4854478828631716e+16, b = -4.712554891970221e+18, c = 2758646212375989.0, cost = 1.2709085355243152e+35
Iteration 4: a= 4.343501644116814e+20, b = 8.235533749226551e+22, c = -4.820935671838988e+19, cost = 3.881369199171854e+43
Iteration 5: a= -7.590586253095058e+24, b = -1.4392196523846473e+27, c = 8.424937075201089e+23, cost = 1.1853745914189544e+52
Iteration 6: a= 1.326510368511469e+29, b = 2.5151414235959125e+31, c = -1.472319266480111e+28, cost = 3.620147555871397e+60
Iteration 7: a= -2.3181737208386835e+33, b = -4.3953932745475034e+35, c = 2.5729854159139745e+32, cost = 1.105597202871857e+69
Iteration 8: a= 4.051177832870898e+37, b = 7.681270666011396e+39, c = -4.496479874458965e+36, cost = 3.37650649906685e+77
Iteration 9: a= -7.079729049644685e+41, b = -1.3423581317783506e+44, c = 7.857926879944079e+40, cost = 1.0311889455424087e+86
Iteration 10: a= 1.2372343423113349e+46, b = 2.3458688442326932e+48, c = -1.3732300949746233e+45, cost = 3.1492628303921182e+94
Iteration 11: a= -2.1621573467862958e+50, b = -4.099577083092681e+52, c = 2.3998198539580117e+49, cost = 9.617884692967256e+102
Iteration 12: a= 3.7785278280657085e+54, b = 7.164310273158479e+56, c = -4.193860411686855e+53, cost = 2.937312982406619e+111
Iteration 13: a= -6.603253259383672e+58, b = -1.2520155286691985e+61, c = 7.32907727374022e+57, cost = 8.970587433766233e+119
Iteration 14: a= 1.1539667190934036e+63, b = 2.187988549158328e+65, c = -1.280809765026251e+62, cost = 2.739627659321216e+128
Iteration 15: a= -2.0166410956339498e+67, b = -3.823669740212017e+69, c = 2.238308579532037e+66, cost = 8.366854196711946e+136
Iteration 16: a= 3.524227554668779e+71, b = 6.682142046784112e+73, c = -3.9116076672823015e+70, cost = 2.5552468384109146e+145
Iteration 17: a= -6.158844964518726e+75, b = -1.1677531106785476e+78, c = 6.835819994909099e+74, cost = 7.80375306142527e+153
Iteration 18: a= 1.0763031248287995e+80, b = 2.0407338215081817e+82, c = -1.194609454154816e+79, cost = 2.3832751078395456e+162
Iteration 19: a= -1.8809182942418207e+84, b = -3.5663313522046286e+86, c = 2.0876672425822773e+83, cost = 7.278549429920333e+170
Iteration 20: a= 3.287042049772272e+88, b = 6.232424424816986e+90, c = -3.648350932258958e+87, cost = 2.2228773182554595e+179
Iteration 21: a= -5.744345977200645e+92, b = -1.0891616727381027e+95, c = 6.375759629418162e+91, cost = 6.788692746528022e+187
Iteration 22: a= 1.0038664004334024e+97, b = 1.9033895455483145e+99, c = -1.1142105462686083e+96, cost = 2.0732745270409844e+196
Iteration 23: a= -1.7543298295730705e+101, b = -3.326312202113057e+103, c = 1.9471642809242535e+100, cost = 6.331804111587467e+204
Iteration 24: a= 3.065819465220816e+105, b = 5.812973435628952e+107, c = -3.402811748286256e+104, cost = 1.9337402155196325e+213
Iteration 25: a= -5.357743358678581e+109, b = -1.0158595498601174e+112, c = 5.946661977991267e+108, cost = 5.905664728753603e+221
Iteration 26: a= 9.363047701635277e+113, b = 1.7752887338463183e+116, c = -1.0392225987316703e+113, cost = 1.8035967607506306e+230
Iteration 27: a= -1.6362609478315793e+118, b = -3.102446680700735e+120, c = 1.816117367544431e+117, cost = 5.508205129817299e+238
Iteration 28: a= 2.8594854738709632e+122, b = 5.421752091975047e+124, c = -3.1737976990896245e+121, cost = 1.6822121447766637e+247
Iteration 29: a= -4.997159643830032e+126, b = -9.474907636509772e+128, c = 5.546443206127292e+125, cost = 5.13749512471037e+255
Iteration 30: a= 8.732901332811723e+130, b = 1.655809288168471e+133, c = -9.692814462503292e+129, cost = 1.5689968853439082e+264
Iteration 31: a= -1.5261382690222234e+135, b = -2.8936476258832726e+137, c = 1.6938900970034892e+134, cost = 4.791734427889445e+272
Iteration 32: a= 2.667038052317318e+139, b = 5.056860498736353e+141, c = -2.960196619698286e+138, cost = 1.46340117318896e+281
Iteration 33: a= -4.660843723593812e+143, b = -8.837232935670386e+145, c = 5.173159724337836e+142, cost = 4.4692439155775235e+289
Iteration 34: a= 8.145164706926056e+147, b = 1.5443709783730996e+150, c = -9.040474323708519e+146, cost = 1.364912201990395e+298
Iteration 35: a= -1.4234270024354842e+152, b = -2.698901043124031e+154, c = 1.5798888948493553e+151, cost = 4.168457471405497e+306
Iteration 36: a= 2.487542614748579e+156, b = 4.716526626425798e+158, c = -2.760971195418877e+155, cost = inf
Iteration 37: a= -4.347162341028204e+160, b = -8.24247464517401e+162, c = 4.824998749459281e+159, cost = inf
Iteration 38: a= 7.596983588224419e+164, b = 1.4404326246286964e+167, c = -8.432037599998082e+163, cost = inf
Iteration 39: a= -1.3276283495338805e+169, b = -2.517261181154549e+171, c = 1.473560135031107e+168, cost = inf
Iteration 40: a= 2.32012747430196e+173, b = 4.399097705650062e+175, c = -2.5751539243057795e+172, cost = inf
这里的问题是您将权重初始化为 0,如 w1=w2=w0=0
中所示。
如果所有权重都用0
初始化,那么W[l]
中的每个w
关于损失函数的导数是相同的,因此所有权重在W[l]
中具有相同的值随后的迭代。
这样我们就必须将权重初始化为一个随机值。
具有较大随机值的权重初始化:
当权重初始化为非常高的值时,项 np.dot(W,X)+b
变得明显更高,如果应用 sigmoid()
这样的激活函数,该函数将其值映射到接近 1
其中梯度的斜率变化缓慢,学习需要很多时间。
有很多方法可以初始化权重,例如 Keras
、Dense
、LSTM
和 CNN
层都使用 glorot_uniform
也称为 Xavier initialization
.
为了您的目的,您可以按照以下公式使用 numpy 的 random.randn
随机初始化权重,其中 l
是特定层。这将导致权重随机初始化为 0 到 1 之间的值:
# Specify the random seed value for reproducibility.
np.random.seed(3)
W[l] = np.random.randn(l, l-1)
您应该做的另一件事是将特征归一化作为预处理步骤,在该步骤中您 return 数据的归一化版本,其中每个特征的平均值为 0,标准差为 1。这通常是使用学习算法时需要执行的良好预处理步骤。
def featureNormalize(X):
"""
X : The dataset of shape (m x n)
"""
X_norm = X.copy()
mu = np.zeros(X.shape[1])
sigma = np.zeros(X.shape[1])
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
X_norm = (X-mu)/ sigma
return X_norm
我正在使用梯度下降学习多元线性回归。我在下面写了 python 代码:
import pandas as pd
import numpy as np
x1 = np.array([1,2,3,4,5,6,7,8,9,10],dtype='float64')
x2 = np.array([5,10,20,40,80,160,320,640,1280,2560],dtype='float64')
y = np.array([350,700,1300,2400,4500,8600,16700,32800,64900,129000],dtype='float64')
def multivar_gradient_descent(x1,x2,y):
w1=w2=w0=0
iteration=500
n=len(x1)
learning_rate=0.02
for i in range(iteration):
y_predicted = w1 * x1 + w2 * x2 +w0
cost = (1*(2/n))*float(sum((y_predicted-y)**2)) # cost function
x1d = sum(x1*(y_predicted-y))/n # derivative for feature x1
x2d = sum(x2*(y_predicted-y))/n # derivative for feature x2
cd = sum(1*(y-y_predicted))/n # derivative for bias
w1 = w1 - learning_rate * x1d
w2 = w2 - learning_rate * x2d
w0 = w0 - learning_rate * cd
print(f"Iteration {i}: a= {w1}, b = {w2}, c = {w0}, cost = {cost} ")
return w1,w2, w0
w1,w2,w0 = multivar_gradient_descent(x1,x2,y)
w1,w2,w0
然而,结果是成本函数越来越高,直到变成 inf(如下所示)。我花了几个小时检查导数和成本函数的公式,但我无法确定错误在哪里。 我感到很沮丧,希望有人能帮我解决这个问题。谢谢。
Iteration 0: a= 4685.5, b = 883029.5, c = -522.5, cost = 4462002500.0
Iteration 1: a= -81383008.375, b = -15430704757.735, c = 9032851.74, cost = 1.3626144151911089e+18
Iteration 2: a= 1422228350500.3176, b = 269662832866446.66, c = -157855848816.2755, cost = 4.161440004246925e+26
Iteration 3: a= -2.4854478828631716e+16, b = -4.712554891970221e+18, c = 2758646212375989.0, cost = 1.2709085355243152e+35
Iteration 4: a= 4.343501644116814e+20, b = 8.235533749226551e+22, c = -4.820935671838988e+19, cost = 3.881369199171854e+43
Iteration 5: a= -7.590586253095058e+24, b = -1.4392196523846473e+27, c = 8.424937075201089e+23, cost = 1.1853745914189544e+52
Iteration 6: a= 1.326510368511469e+29, b = 2.5151414235959125e+31, c = -1.472319266480111e+28, cost = 3.620147555871397e+60
Iteration 7: a= -2.3181737208386835e+33, b = -4.3953932745475034e+35, c = 2.5729854159139745e+32, cost = 1.105597202871857e+69
Iteration 8: a= 4.051177832870898e+37, b = 7.681270666011396e+39, c = -4.496479874458965e+36, cost = 3.37650649906685e+77
Iteration 9: a= -7.079729049644685e+41, b = -1.3423581317783506e+44, c = 7.857926879944079e+40, cost = 1.0311889455424087e+86
Iteration 10: a= 1.2372343423113349e+46, b = 2.3458688442326932e+48, c = -1.3732300949746233e+45, cost = 3.1492628303921182e+94
Iteration 11: a= -2.1621573467862958e+50, b = -4.099577083092681e+52, c = 2.3998198539580117e+49, cost = 9.617884692967256e+102
Iteration 12: a= 3.7785278280657085e+54, b = 7.164310273158479e+56, c = -4.193860411686855e+53, cost = 2.937312982406619e+111
Iteration 13: a= -6.603253259383672e+58, b = -1.2520155286691985e+61, c = 7.32907727374022e+57, cost = 8.970587433766233e+119
Iteration 14: a= 1.1539667190934036e+63, b = 2.187988549158328e+65, c = -1.280809765026251e+62, cost = 2.739627659321216e+128
Iteration 15: a= -2.0166410956339498e+67, b = -3.823669740212017e+69, c = 2.238308579532037e+66, cost = 8.366854196711946e+136
Iteration 16: a= 3.524227554668779e+71, b = 6.682142046784112e+73, c = -3.9116076672823015e+70, cost = 2.5552468384109146e+145
Iteration 17: a= -6.158844964518726e+75, b = -1.1677531106785476e+78, c = 6.835819994909099e+74, cost = 7.80375306142527e+153
Iteration 18: a= 1.0763031248287995e+80, b = 2.0407338215081817e+82, c = -1.194609454154816e+79, cost = 2.3832751078395456e+162
Iteration 19: a= -1.8809182942418207e+84, b = -3.5663313522046286e+86, c = 2.0876672425822773e+83, cost = 7.278549429920333e+170
Iteration 20: a= 3.287042049772272e+88, b = 6.232424424816986e+90, c = -3.648350932258958e+87, cost = 2.2228773182554595e+179
Iteration 21: a= -5.744345977200645e+92, b = -1.0891616727381027e+95, c = 6.375759629418162e+91, cost = 6.788692746528022e+187
Iteration 22: a= 1.0038664004334024e+97, b = 1.9033895455483145e+99, c = -1.1142105462686083e+96, cost = 2.0732745270409844e+196
Iteration 23: a= -1.7543298295730705e+101, b = -3.326312202113057e+103, c = 1.9471642809242535e+100, cost = 6.331804111587467e+204
Iteration 24: a= 3.065819465220816e+105, b = 5.812973435628952e+107, c = -3.402811748286256e+104, cost = 1.9337402155196325e+213
Iteration 25: a= -5.357743358678581e+109, b = -1.0158595498601174e+112, c = 5.946661977991267e+108, cost = 5.905664728753603e+221
Iteration 26: a= 9.363047701635277e+113, b = 1.7752887338463183e+116, c = -1.0392225987316703e+113, cost = 1.8035967607506306e+230
Iteration 27: a= -1.6362609478315793e+118, b = -3.102446680700735e+120, c = 1.816117367544431e+117, cost = 5.508205129817299e+238
Iteration 28: a= 2.8594854738709632e+122, b = 5.421752091975047e+124, c = -3.1737976990896245e+121, cost = 1.6822121447766637e+247
Iteration 29: a= -4.997159643830032e+126, b = -9.474907636509772e+128, c = 5.546443206127292e+125, cost = 5.13749512471037e+255
Iteration 30: a= 8.732901332811723e+130, b = 1.655809288168471e+133, c = -9.692814462503292e+129, cost = 1.5689968853439082e+264
Iteration 31: a= -1.5261382690222234e+135, b = -2.8936476258832726e+137, c = 1.6938900970034892e+134, cost = 4.791734427889445e+272
Iteration 32: a= 2.667038052317318e+139, b = 5.056860498736353e+141, c = -2.960196619698286e+138, cost = 1.46340117318896e+281
Iteration 33: a= -4.660843723593812e+143, b = -8.837232935670386e+145, c = 5.173159724337836e+142, cost = 4.4692439155775235e+289
Iteration 34: a= 8.145164706926056e+147, b = 1.5443709783730996e+150, c = -9.040474323708519e+146, cost = 1.364912201990395e+298
Iteration 35: a= -1.4234270024354842e+152, b = -2.698901043124031e+154, c = 1.5798888948493553e+151, cost = 4.168457471405497e+306
Iteration 36: a= 2.487542614748579e+156, b = 4.716526626425798e+158, c = -2.760971195418877e+155, cost = inf
Iteration 37: a= -4.347162341028204e+160, b = -8.24247464517401e+162, c = 4.824998749459281e+159, cost = inf
Iteration 38: a= 7.596983588224419e+164, b = 1.4404326246286964e+167, c = -8.432037599998082e+163, cost = inf
Iteration 39: a= -1.3276283495338805e+169, b = -2.517261181154549e+171, c = 1.473560135031107e+168, cost = inf
Iteration 40: a= 2.32012747430196e+173, b = 4.399097705650062e+175, c = -2.5751539243057795e+172, cost = inf
这里的问题是您将权重初始化为 0,如 w1=w2=w0=0
中所示。
如果所有权重都用0
初始化,那么W[l]
中的每个w
关于损失函数的导数是相同的,因此所有权重在W[l]
中具有相同的值随后的迭代。
这样我们就必须将权重初始化为一个随机值。
具有较大随机值的权重初始化:
当权重初始化为非常高的值时,项 np.dot(W,X)+b
变得明显更高,如果应用 sigmoid()
这样的激活函数,该函数将其值映射到接近 1
其中梯度的斜率变化缓慢,学习需要很多时间。
有很多方法可以初始化权重,例如 Keras
、Dense
、LSTM
和 CNN
层都使用 glorot_uniform
也称为 Xavier initialization
.
为了您的目的,您可以按照以下公式使用 numpy 的 random.randn
随机初始化权重,其中 l
是特定层。这将导致权重随机初始化为 0 到 1 之间的值:
# Specify the random seed value for reproducibility.
np.random.seed(3)
W[l] = np.random.randn(l, l-1)
您应该做的另一件事是将特征归一化作为预处理步骤,在该步骤中您 return 数据的归一化版本,其中每个特征的平均值为 0,标准差为 1。这通常是使用学习算法时需要执行的良好预处理步骤。
def featureNormalize(X):
"""
X : The dataset of shape (m x n)
"""
X_norm = X.copy()
mu = np.zeros(X.shape[1])
sigma = np.zeros(X.shape[1])
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
X_norm = (X-mu)/ sigma
return X_norm