R提取包括NA行的glm系数
R extract glm coefficients including NA rows
我想提取 glm 的系数,不仅是可计算的 p 值,还有不可计算的 p 值,表示为 NA。我如何提取矩阵或 data.frame 形式的包含 NA 行的系数?
我需要提取下面这个,
Estimate Std. Error z value Pr(>|z|)
x1 0.10909 0.05552 1.965 0.0494
x2 NA NA NA NA
x3 NA NA NA NA
x4 0.05472 0.12871 0.425 0.6707
x5 -0.07880 0.17616 -0.447 0.6547
下面不需要这个
coef(outSummary)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.38909359 26.07327652 -0.3217506 0.74764161
x1 0.10908801 0.05551894 1.9648793 0.04942821
x4 0.05471872 0.12871334 0.4251208 0.67074860
x5 -0.07879775 0.17616064 -0.4473062 0.65465396
这是示例代码。
maxRow = 12
maxX = 5
dfA = data.frame(matrix(data = 0, nrow = maxRow, ncol = (maxX+1)) )
colnames(dfA) = c("y", paste0("x", 1:maxX) )
dfA$y = c( rep(0, maxRow*0.5), rep(1, maxRow*0.5))
xWithData = paste0("x", c(1, 4:maxX) )
ctSeed = 384
set.seed(ctSeed)
dfA[, xWithData] = apply(dfA[ , xWithData ], MARGIN = 2, FUN = function(x) ( 1 * seq_len(maxRow) + round(rnorm(n = maxRow, mean = 100, sd = 10) ) ) )
dfA
outGlm = glm( y ~ ., family = binomial(link='logit'), data=dfA )
(outSummary = summary(outGlm) )
(outCoef = outSummary$coefficients )
似乎 coef(outSummary)
将始终丢弃 NA
的预测变量。
因此,获得所有预测器估计的完整 table 的一种方法是使用 dplyr::full_join
将来自 attr(outSummary$terms, "term.labels")
的条目与来自 coef(outSummary)
的条目进行匹配和合并。这是一个 tidyverse
方法:
library(tidyverse);
data.frame(coef(outSummary)) %>%
rownames_to_column("variable") %>%
full_join(data.frame(variable = attr(outSummary$terms, "term.labels"))) %>%
arrange(variable);
# variable Estimate Std..Error z.value Pr...z..
#1 (Intercept) -8.38909359 26.07327652 -0.3217506 0.74764161
#2 x1 0.10908801 0.05551894 1.9648793 0.04942821
#3 x2 NA NA NA NA
#4 x3 NA NA NA NA
#5 x4 0.05471872 0.12871334 0.4251208 0.67074860
#6 x5 -0.07879775 0.17616064 -0.4473062 0.65465396
我想提取 glm 的系数,不仅是可计算的 p 值,还有不可计算的 p 值,表示为 NA。我如何提取矩阵或 data.frame 形式的包含 NA 行的系数?
我需要提取下面这个,
Estimate Std. Error z value Pr(>|z|)
x1 0.10909 0.05552 1.965 0.0494
x2 NA NA NA NA
x3 NA NA NA NA
x4 0.05472 0.12871 0.425 0.6707
x5 -0.07880 0.17616 -0.447 0.6547
下面不需要这个
coef(outSummary)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.38909359 26.07327652 -0.3217506 0.74764161
x1 0.10908801 0.05551894 1.9648793 0.04942821
x4 0.05471872 0.12871334 0.4251208 0.67074860
x5 -0.07879775 0.17616064 -0.4473062 0.65465396
这是示例代码。
maxRow = 12
maxX = 5
dfA = data.frame(matrix(data = 0, nrow = maxRow, ncol = (maxX+1)) )
colnames(dfA) = c("y", paste0("x", 1:maxX) )
dfA$y = c( rep(0, maxRow*0.5), rep(1, maxRow*0.5))
xWithData = paste0("x", c(1, 4:maxX) )
ctSeed = 384
set.seed(ctSeed)
dfA[, xWithData] = apply(dfA[ , xWithData ], MARGIN = 2, FUN = function(x) ( 1 * seq_len(maxRow) + round(rnorm(n = maxRow, mean = 100, sd = 10) ) ) )
dfA
outGlm = glm( y ~ ., family = binomial(link='logit'), data=dfA )
(outSummary = summary(outGlm) )
(outCoef = outSummary$coefficients )
似乎 coef(outSummary)
将始终丢弃 NA
的预测变量。
因此,获得所有预测器估计的完整 table 的一种方法是使用 dplyr::full_join
将来自 attr(outSummary$terms, "term.labels")
的条目与来自 coef(outSummary)
的条目进行匹配和合并。这是一个 tidyverse
方法:
library(tidyverse);
data.frame(coef(outSummary)) %>%
rownames_to_column("variable") %>%
full_join(data.frame(variable = attr(outSummary$terms, "term.labels"))) %>%
arrange(variable);
# variable Estimate Std..Error z.value Pr...z..
#1 (Intercept) -8.38909359 26.07327652 -0.3217506 0.74764161
#2 x1 0.10908801 0.05551894 1.9648793 0.04942821
#3 x2 NA NA NA NA
#4 x3 NA NA NA NA
#5 x4 0.05471872 0.12871334 0.4251208 0.67074860
#6 x5 -0.07879775 0.17616064 -0.4473062 0.65465396