正确实现梯度下降的铰链损失最小化

Question

我从 here 复制了铰链损失函数（也是它所基于的 LossC 和 LossFunc。然后我将它包含在我的梯度下降算法中，如下所示：

  do 
  {
    iteration++;
    error = 0.0;
    cost = 0.0;

    //loop through all instances (complete one epoch)
    for (p = 0; p < number_of_files__train; p++) 
    {

      // 1. Calculate the hypothesis h = X * theta
      hypothesis = calculateHypothesis( theta, feature_matrix__train, p, globo_dict_size );

      // 2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
      //cost = hypothesis - outputs__train[p];
      cost = HingeLoss.loss(hypothesis, outputs__train[p]);
      System.out.println( "cost " + cost );

      // 3. Calculate the gradient = X' * loss / m
      gradient = calculateGradent( theta, feature_matrix__train, p, globo_dict_size, cost, number_of_files__train);

      // 4. Update the parameters theta = theta - alpha * gradient
      for (int i = 0; i < globo_dict_size; i++) 
      {
          theta[i] = theta[i] - LEARNING_RATE * gradient[i];
      }

    }

    //summation of squared error (error value for all instances)
    error += (cost*cost);       

  /* Root Mean Squared Error */
  //System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( error/number_of_files__train ) );
  System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( error/number_of_files__train ) );

  } 
  while( error != 0 );

但这根本不起作用。那是由于损失函数吗？也许我是如何将损失函数添加到我的代码中的？

我想也有可能我的梯度下降实现有问题。

下面是我计算梯度的方法和假设，对吗？

static double calculateHypothesis( double[] theta, double[][] feature_matrix, int file_index, int globo_dict_size )
{
    double hypothesis = 0.0;

     for (int i = 0; i < globo_dict_size; i++) 
     {
         hypothesis += ( theta[i] * feature_matrix[file_index][i] );
     }
     //bias
     hypothesis += theta[ globo_dict_size ];

     return hypothesis;
}

static double[] calculateGradent( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size, double cost, int number_of_files__train)
{
    double m = number_of_files__train;

    double[] gradient = new double[ globo_dict_size];//one for bias?

    for (int i = 0; i < gradient.length; i++) 
    {
        gradient[i] = (1.0/m) * cost * feature_matrix[ file_index ][ i ] ;
    }

    return gradient;
}

剩下的代码是here，有兴趣的可以看看。

这句话下面是那些损失函数的样子。我应该使用 loss 还是 deriv，这些是否正确？

/**
 * Computes the HingeLoss loss
 *
 * @param pred the predicted value
 * @param y the target value
 * @return the HingeLoss loss
 */
public static double loss(double pred, double y)
{
    return Math.max(0, 1 - y * pred);
}

/**
 * Computes the first derivative of the HingeLoss loss
 *
 * @param pred the predicted value
 * @param y the target value
 * @return the first derivative of the HingeLoss loss
 */
public static double deriv(double pred, double y)
{
    if (pred * y > 1)
        return 0;
    else
        return -y;
}

Answer 1

你提供的梯度代码看起来不像是Hinge loss的梯度。看看一个有效的方程式，例如： https://stats.stackexchange.com/questions/4608/gradient-of-hinge-loss

正确实现梯度下降的铰链损失最小化

correct implementation of Hinge loss minimization for gradient descent

java

machine-learning