Predict () based on a given x - 许多人都遇到过这个问题，但没有一个答案有效

Question

我正在尝试使用多元回归模型来预测基于给定 x 的值，我发现很多人都遇到过同样的问题，但 none 到目前为止给出的答案都有效对我来说。

我的模特是

M_PS_av <- glm.nb(PS_av ~ poly(Age_a,2) + Income_a + Education_a + GroupA_a + GroupB_a + GroupC_a + GroupD_a + GroupE_a, data = BCC_a)

我对年龄的影响很感兴趣，特别是何时达到年龄高峰，因此我只想根据年龄进行预测。

到目前为止我已经试过了

predict(M_PS_av, data.frame(Age_a = 15))
predict(M_PS_av, data.frame(Age_a=Age_a[15]))
predict(M_PS_av, newdata = new.ages)

我在其中创建了另一个数据框，但这并不是我想要的return

我也试过为不同的变量赋值，并将其用作我的 data.frame:

df <- data.frame(Age_c=19,Income_a=1, Education_a=1, GroupA_a=1, GroupB_a=1, GroupC_a=1, GroupD_a=1, GroupEa=1)

我也尝试过使用和不使用 poly(..., raw=TRUE)

但我仍然遇到错误。这是我大部分时间遇到的错误：

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  variable lengths differ (found for 'Income_a')
In addition: Warning message:
'newdata' had 1 row but variables found have 1019 rows

有人能帮忙吗？

谢谢！

Answer 1

其中最困难的部分是尝试重新创建您的数据结构，以便我们可以为您的代码提供一个工作示例。当然，数值和因子水平会与你自己的数据完全不同，但足以演示：

set.seed(69)

df <- data.frame(Education_a = factor(c("Private", "Public")),
                 GroupA_a = factor(c("A1", "A2")),
                 GroupB_a = factor(c("B1", "B2")),
                 GroupC_a = factor(c("C1", "C2")),
                 GroupD_a = factor(c("D1", "D2")),
                 GroupE_a = factor(c("E1", "E2")))

BCC_a          <- expand.grid(df)[rep(1:64, 20), ]
BCC_a$Age_a    <- round(rgamma(64 * 20, 15, 1))
BCC_a$Income_a <- rgamma(64 * 20, 15, 1/2000)
lambdas        <- apply(do.call(cbind, lapply(BCC_a[1:6], 
                                       function(x) runif(2, 0.5, 1.5)[as.numeric(x)]
                                )), 1, prod)
BCC_a$PS_av    <- rpois(nrow(BCC_a), 1 + lambdas/2 * BCC_a$Age_a^2 + 0.001 * BCC_a$Income_a)

这里我假设年龄和收入是数值变量，而组是因子变量：

 head(BCC_a)
#>   Education_a GroupA_a GroupB_a GroupC_a GroupD_a GroupE_a Age_a Income_a PS_av
#> 1     Private       A1       B1       C1       D1       E1    15 30500.19   162
#> 2      Public       A1       B1       C1       D1       E1    16 41160.54   170
#> 3     Private       A2       B1       C1       D1       E1    13 43146.83   107
#> 4      Public       A2       B1       C1       D1       E1    18 33023.85   124
#> 5     Private       A1       B2       C1       D1       E1     8 31122.07    65
#> 6      Public       A1       B2       C1       D1       E1    21 26487.43   215

现在让我们创建您的模型：

library(MASS)
M_PS_av <- glm.nb(PS_av ~ poly(Age_a,2) + Income_a + Education_a + GroupA_a +
                          GroupB_a + GroupC_a + GroupD_a + GroupE_a, data = BCC_a)

我们可以用 summary(M_PS_av)

进行审核

#> glm.nb(formula = PS_av ~ poly(Age_a, 2) + Income_a + Education_a + 
#>     GroupA_a + GroupB_a + GroupC_a + GroupD_a + GroupE_a, data = BCC_a, 
#>     init.theta = 814.4965099, link = log)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -3.4821  -0.6993  -0.0217   0.6828   4.1628  
#> 
#> Coefficients:
#>                     Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)        4.750e+00  1.273e-02 372.981  < 2e-16 ***
#> poly(Age_a, 2)1    1.309e+01  1.012e-01 129.326  < 2e-16 ***
#> poly(Age_a, 2)2   -1.077e+00  8.885e-02 -12.118  < 2e-16 ***
#> Income_a           8.215e-06  3.486e-07  23.565  < 2e-16 ***
#> Education_aPublic -1.487e-01  5.464e-03 -27.218  < 2e-16 ***
#> GroupA_aA2        -3.534e-01  5.523e-03 -63.989  < 2e-16 ***
#> GroupB_aB2        -2.518e-02  5.481e-03  -4.593 4.37e-06 ***
#> GroupC_aC2         7.447e-02  5.445e-03  13.676  < 2e-16 ***
#> GroupD_aD2        -3.102e-02  5.442e-03  -5.701 1.19e-08 ***
#> GroupE_aE2        -4.514e-02  5.446e-03  -8.289  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for Negative Binomial(814.4965) family taken to be 1)
#> 
#>     Null deviance: 26983  on 1279  degrees of freedom
#> Residual deviance:  1345  on 1270  degrees of freedom
#> AIC: 9952.3
#> 
#> Number of Fisher Scoring iterations: 1
#> 
#>               Theta:  814 
#>           Std. Err.:  234 
#>  2 x log-likelihood:  -9930.252

现在，要使用 predict，我们需要将预测变量的数据框设置为我们要检查的水平。注意我们需要 all 个预测变量，如果有因子变量，我们需要给出命名的因子水平：

new_data <- data.frame(Age_a = 15, Income_a = mean(BCC_a$Income_a), 
                       Education_a = "Private", GroupA_a = "A1", GroupB_a = "B1", 
                       GroupC_a = "C1", GroupD_a = "D1", GroupE_a = "E1")

现在我们只需将其插入预测。注意，我们需要使用type = "response"来得到结果变量的实际期望值（否则会得到期望值的自然对数）：

 predict(M_PS_av, newdata = new_data, type = "response")
#>        1 
#> 153.0262

我输入的数据看起来是正确的。

Predict () based on a given x - 许多人都遇到过这个问题，但没有一个答案有效

Predict () based on a given x - many ppl with this problem, yet non of the answers work

regression

r

predict