阈值不适用于 numpy 数组的精度指标

Question

我正在尝试使用 numpy 从头开始实施逻辑回归。我用以下方法写了一个 class 来实现二进制 class 化问题的逻辑回归，并根据 BCE 损失或准确性对其进行评分。

  def accuracy(self, true_labels, predictions):
    """
    This method implements the accuracy score. Where the accuracy is the number 
    of correct predictions our model has.

    args:
      true_labels: vector of shape (1, m) that contains the class labels where,
      m is the number of samples in the batch.
      predictions: vector of shape (1, m) that contains the model predictions. 
    """
    
    counter = 0
    for y_true, y_pred in zip(true_labels, predictions):
      if y_true == y_pred:
        counter+=1
    
    return counter/len(true_labels)

  def train(self, score='loss'):
    """
    This function trains the logistic regression model and updates the 
    parameters based on the Batch-Gradient Descent algorithm.
    The function prints the training loss and validation loss on every epoch.

    args:
    X: input features with shape (num_features, m) or (num_features) for a 
       singluar sample where m is the size of the dataset.
    Y: gold class labels of shape (1, m) or (1) for a singular sample.

    """
    train_scores = []
    dev_scores = []
    for i in range(self.epochs):
      # perform forward and backward propagation & get the training predictions.
      training_predictions = self.propagation(self.X_train, self.Y_train)
      # get the predictions of the validation data

      dev_predictions = self.predict(self.X_dev, self.Y_dev)

      # calculate the scores of the predictions.
      if score == 'loss':  
        train_score = self.loss_function(training_predictions, self.Y_train)
        dev_score = self.loss_function(dev_predictions, self.Y_dev)
      elif score == 'accuracy':
        train_score = self.accuracy((training_predictions==+1).squeeze(), self.Y_train)
        dev_score = self.accuracy((dev_predictions==+1).squeeze(), self.Y_dev)
      
      train_scores.append(train_score)
      dev_scores.append(dev_score)
    plot_training_and_validation(train_scores, dev_scores, self.epochs, score=score)

使用以下输入测试代码后

model = LogisticRegression(num_features=X_train.shape[0],
                           Learning_rate = 0.01,
                           Lambda = 0.001,
                           epochs=500,
                           X_train=X_train,
                           Y_train=Y_train,
                           X_dev=X_dev,
                           Y_dev=Y_dev,
                           normalize=False,
                           regularize = False,)
model.train(score = 'loss')

我得到以下结果

然而，当我交换评分指标以衡量随时间从损失到准确性的评估时，model.train(score='accuracy') 我得到以下结果：我删除了规范化和正则化以确保我使用的是逻辑回归的简单实现。

请注意，我使用外部方法在 LogisticRegression.train() 方法中可视化 training/validation 得分超时。

Answer 1

您在传递到准确性方法之前用来创建预测的技巧是错误的。您正在使用 (dev_predictions==+1)。您的问题陈述是一个逻辑回归模型，它会生成一个介于 0 和 1 之间的值。大多数时候，这些值不会完全等于 +1。

所以基本上，每次您将一堆 False 或 0 传递给精度函数时。我敢打赌，如果您检查数据集中类的值 False or 0 的数量将是：

验证数据集中正好是 51.7%
在训练数据集中正好是 56.2%。

要解决此问题，您可以使用 in-between 阈值（例如 0.5）来生成标签。所以使用类似 dev_predictions>0.5

阈值不适用于 numpy 数组的精度指标

Threshold does not work on numpy array for accuracy metric

numpy

machine-learning

python-3.x

logistic-regression