为什么 pytorch 中的正则化和 scratch 代码不匹配,pytorch 中用于正则化的公式是什么?

Why does regularization in pytorch and scratch code does not match and what is the formula used for regularization in pytorch?

我一直在尝试对 PyTorch 中的二元分类模型进行 L2 正则化,但是当我匹配 PyTorch 的结果和草稿代码时它不匹配, 火炬代码:

class LogisticRegression(nn.Module):
  def __init__(self,n_input_features):
    super(LogisticRegression,self).__init__()
    self.linear=nn.Linear(4,1)
    self.linear.weight.data.fill_(0.0)
    self.linear.bias.data.fill_(0.0)

  def forward(self,x):
    y_predicted=torch.sigmoid(self.linear(x))
    return y_predicted

model=LogisticRegression(4)

criterion=nn.BCELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=0.05,weight_decay=0.1)
dataset=Data()
train_data=DataLoader(dataset=dataset,batch_size=1096,shuffle=False)

num_epochs=1000
for epoch in range(num_epochs):
  for x,y in train_data:
    y_pred=model(x)
    loss=criterion(y_pred,y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

刮刮码:

def sigmoid(z):
    s = 1/(1+ np.exp(-z))
    return s  

def yinfer(X, beta):
  return sigmoid(beta[0] + np.dot(X,beta[1:]))

def cost(X, Y, beta, lam):
    sum = 0
    sum1 = 0
    n = len(beta)
    m = len(Y)
    for i in range(m): 
        sum = sum + Y[i]*(np.log( yinfer(X[i],beta)))+ (1 -Y[i])*np.log(1-yinfer(X[i],beta))
    for i in range(0, n): 
        sum1 = sum1 + beta[i]**2
        
    return  (-sum + (lam/2) * sum1)/(1.0*m)

def pred(X,beta):
  if ( yinfer(X, beta) > 0.5):
    ypred = 1
  else :
    ypred = 0
  return ypred
beta = np.zeros(5)
iterations = 1000
arr_cost = np.zeros((iterations,4))
print(beta)
n = len(Y_train)
for i in range(iterations):
    Y_prediction_train=np.zeros(len(Y_train))
    Y_prediction_test=np.zeros(len(Y_test)) 

    for l in range(len(Y_train)):
        Y_prediction_train[l]=pred(X[l,:],beta)
    
    for l in range(len(Y_test)):
        Y_prediction_test[l]=pred(X_test[l,:],beta)
    
    train_acc = format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)
    test_acc = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100   
    arr_cost[i,:] = [i,cost(X,Y_train,beta,lam),train_acc,test_acc]
    temp_beta = np.zeros(len(beta))

    ''' main code from below '''

    for j in range(n): 
        temp_beta[0] = temp_beta[0] + yinfer(X[j,:], beta) - Y_train[j]
        temp_beta[1:] = temp_beta[1:] + (yinfer(X[j,:], beta) - Y_train[j])*X[j,:]
    
    for k in range(0, len(beta)):
        temp_beta[k] = temp_beta[k] +  lam * beta[k]  #regularization here
    
    temp_beta= temp_beta / (1.0*n)
    
    beta = beta - alpha*temp_beta

graph of the losses

graph of training accuracy

graph of testing accuracy

有人能告诉我为什么会这样吗? L2值=0.1

好问题。我通过 PyTorch 文档进行了大量挖掘并找到了答案。答案非常棘手。基本上有两种方法来计算正则化。 (夏季跳转到最后一节)。

PyTorch使用第一种(正则化因子不除以batch size)

这是一个示例代码,它演示了:

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import torch.optim as optim
 
class model(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)
        self.linear.weight.data.fill_(1.0)
        self.linear.bias.data.fill_(1.0)

    def forward(self, x):
        return self.linear(x)


model     = model()
optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1.0)

input     = torch.tensor([[2], [4]], dtype=torch.float32)
target    = torch.tensor([[7], [11]], dtype=torch.float32)

optimizer.zero_grad()
pred      = model(input)
loss      = F.mse_loss(pred, target)

print(f'input: {input[0].data, input[1].data}')
print(f'prediction: {pred[0].data, pred[1].data}')
print(f'target: {target[0].data, target[1].data}')

print(f'\nMSEloss: {loss.item()}\n')

loss.backward()

print('Before updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data, gradient]: {model.linear.weight.data, model.linear.weight.grad}')
print(f'bias [data, gradient]: {model.linear.bias.data, model.linear.bias.grad}')
print('--------------------------------------------------------------------------')
 
optimizer.step()

print('After updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data]: {model.linear.weight.data}')
print(f'bias [data]: {model.linear.bias.data}')
print('--------------------------------------------------------------------------')

输出:

input: (tensor([2.]), tensor([4.]))
prediction: (tensor([3.]), tensor([5.]))
target: (tensor([7.]), tensor([11.]))

MSEloss: 26.0

Before updation:
--------------------------------------------------------------------------
weight [data, gradient]: (tensor([[1.]]), tensor([[-32.]]))
bias [data, gradient]: (tensor([1.]), tensor([-10.]))
--------------------------------------------------------------------------
After updation:
--------------------------------------------------------------------------
weight [data]: tensor([[4.1000]])
bias [data]: tensor([1.9000])
--------------------------------------------------------------------------

这里m = batch size = 2, lr = alpha = 0.1, lambda = weight_decay = 1.

现在考虑张量 weight,它有 value = 1 和 grad = -32

case1(type1正则化):

 weight = weight - lr(grad + weight_decay.weight)
 weight = 1 - 0.1(-32 + 1(1))
 weight = 4.1

case2(type2正则化):

 weight = weight - lr(grad + (weight_decay/batch size).weight)
 weight = 1 - 0.1(-32 + (1/2)(1))
 weight = 4.15

输出我们可以看到更新后的weight = 4.1000PyTorch 使用 type1 正则化。

所以最后在您的代码中,您遵循 type2 正则化。所以只需将最后几行更改为:

# for k in range(0, len(beta)):
#    temp_beta[k] = temp_beta[k] +  lam * beta[k]  #regularization here

temp_beta= temp_beta / (1.0*n)

beta = beta - alpha*(temp_beta + lam * beta)

还有 PyTorch loss 函数 包括正则化项(在 optimizers 中实现) 因此还要删除自定义 cost 函数中的 regularization 项。

总结:

  1. Pytorch 使用这个 正则化 函数:

  2. 正则化Optimizers(weight_decay参数)中实现。

  3. PyTorch Loss 函数 包括正则化项。

  4. 如果使用正则化,
  5. 偏差 也会正则化

  6. 要使用正则化,请尝试:

    torch.nn.optim.optimiser_name(model.parameters(), lr, weight_decay=lambda).