为每对 10 类构建逻辑回归模型

Question

我正在研究 MNIST digit recognizer 数据集

在这里，我有 10 个 class 标签，我希望构建和比较所有 class 对，即运行 10c2 逻辑回归模型并进行比较.我知道我可以使用 combn(unique(mnist$label), 2, function(x) , simplify = TRUE) 在循环中并在函数中编写模型。但是，我卡在这里了。

loglist <- list()
for(i in unique(mnist$label)){ 
        tmp <- try(append(loglist, glm(label~.,family=binomial(link=logit),
                   data = mnist[mnist$label == i, ])))
        if (class(tmp) != "try-error") loglist <- append(loglist, tmp)
}

任何帮助或建议都会有很大帮助，谢谢。

Answer 1

有 3 种方法可以将逻辑回归模型用于多个（在您的情况下为 10 个）classes。

一对一休息
一对一

这两个方法在Andrew NGwiki and have good video lecture上有很好的解释。

另一种方法是使用 Softmax 回归，可以在给定的 link 处找到 good tutorial。该模型将逻辑回归推广到 class 化问题，其中 class 标签 y 可以取两个以上的可能值。

那么，什么时候使用哪个模型：

This will depend on whether the four classes are mutually exclusive. For example, if your four classes are classical, country, rock, and jazz, then assuming each of your training examples is labeled with exactly one of these four class labels, you should build a softmax classifier.

If however your categories are has_vocals, dance, soundtrack, pop, then the classes are not mutually exclusive; for example, there can be a piece of pop music that comes from a soundtrack and in addition has vocals. In this case, it would be more appropriate to build 4 binary logistic regression classifiers. This way, for each new musical piece, your algorithm can separately decide whether it falls into each of the four categories.

为每对 10 类构建逻辑回归模型

Build logistic regression models for each PAIR of 10 classes

regression

r

machine-learning

subset

logistic-regression

为每对 10 类 构建逻辑回归模型

Build logistic regression models for each PAIR of 10 classes

regression

r

machine-learning

subset

logistic-regression

为每对 10 类构建逻辑回归模型