多项式回归 nnet 包的概率结果

Question

下午好，我在使用 NNET 包执行逻辑回归时得到的输出有问题。我想用 HS_TR (Return Period) 和 SLR (Sea Level Rise) 预测 Category。称为 fit 的多项式模型是使用来自 x.sub 子集的信息计算得出的。可能有 4 个不同的类别 1、2、3 或 4。

x.sub:

   POINTID  HS_TR  SLR  Category
       4     10    0.0     3
       4     10    0.6     4
       4     50    0.0     3
       4     50    0.6     4
       4    100    0.0     4
       4    100    0.6     4

当我运行模型 > fit <- multinom(Category ~ HS_TR + SLR, x.sub, maxit=3000) 我得到结果 :

Coefficients:
    (Intercept)       HS_TR         SLR 
    -30.5791517   0.4130478  62.0976951 

    Residual Deviance: 0.0001820405 
    AIC: 6.000182

既然有了多项式，我想知道SLR和HS_TR的特定场景（d3）的预测类别。我定义了 d3 并应用了预测，我得到了合理的结果：

d3<-data.frame("HS_TR"=c(10),"SLR"=c(0))
prediction <-(predict(fit,d3))

我明白了

> prediction
[[1]]
[1] 3 
Level: 3

但是，当我计算得到预测 prediction <-(predict(fit,d3, type="probs")) 的概率时，我得到以下结果：

> prediction
[[1]]
1 
0

这没有意义，因为它说概率为 0。由于模型 I 运行给出了 CATEGORY 的预测，我不明白为什么那么概率为 0 . 有人知道我为什么得到它吗？

如果有人知道我可以如何解决这个问题，那么我就可以解决它。提前谢谢你。

Answer 1

您在分离/完全分离方面遇到问题（Google 获取更多信息的术语。This page 提供了一个很好的介绍，其中包含以下引用：

A complete separation happens when the outcome variable separates a predictor variable or a combination of predictor variables completely.

如果您查看数据，例如使用

> xtabs(~ Category + HS_TR + SLR, data=x.sub)
, , SLR = 0

        HS_TR
Category 10 50 100
       3  1  1   0
       4  0  0   1

, , SLR = 0.6

        HS_TR
Category 10 50 100
       3  0  0   0
       4  1  1   1

然后您会看到 SLR 和 HS_TR 的组合完全决定了 SLR=0.6 的结果。您需要指定更简单的模型或获取更多数据以提供稳定的拟合。

在您的情况下，您的输出只有两个可能的类别，因此您应该能够拟合对数线性模型或逻辑回归模型并获得相同的结果。如果您创建一个新变量 Cat，它是 Category 的因数，那么您会看到一条警告，指出正确的方向。

> glm(Cat ~HS_TR + SLR, data=x.sub, family="binomial") 
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

我认为multinom没有检测到数据中的问题。但是，如果您查看拟合的 summary，您会发现其中两个参数估计值的标准误差非常大。这也表明估计值不稳定，分离可能是一个问题。

> summary(fit)
Call:
multinom(formula = Category ~ HS_TR + SLR, data = x.sub, maxit = 3000)

Coefficients:
                 Values  Std. Err.
(Intercept) -30.5791517 356.932851
HS_TR         0.4130478   5.137396
SLR          62.0976951 634.584184

Residual Deviance: 0.0001820405 
AIC: 6.000182

我认为 multinom 中的收敛检查缺少某种检查。

多项式回归 nnet 包的概率结果

Probability results from Multinomial Regression nnet package

regression

r

probability

nnet

multinomial