R 中 revoScaleR::rxGlm() 的方差分析问题
ANOVA problems with revoScaleR::rxGlm() in R
我构建了很多 GLM。通常在具有许多模型参数的大型数据集上。这意味着基础 R 的 glm()
函数并不是很有用,因为它无法处理 size/complexity,所以我通常使用 revoScaleR::rxGlm()
代替。
但是我希望能够对嵌套模型对进行方差分析测试,但我还没有找到对 rxGlm()
创建的模型对象执行此操作的方法,因为 R 的 anova()
功能不适用于它们。 revoScaleR
提供了一个 as.glm()
函数,可以将 rxGlm()
对象转换为 glm()
对象 - 有点 - 但它在这里不起作用。
例如:
library(dplyr)
data(mtcars)
# don't like having named rows
mtcars <- mtcars %>%
mutate(veh_name = rownames(.)) %>%
select(veh_name, everything())
# fit a GLM: mpg ~ everything else
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a1)
# fit another GLM where gear is removed
glm_a2 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a2)
# F test on difference
anova(glm_a1, glm_a2, test = "F")
工作正常,但如果我这样做:
library(dplyr)
data(mtcars)
# don't like having named rows
mtcars <- mtcars %>%
mutate(veh_name = rownames(.)) %>%
select(veh_name, everything())
glm_b1 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1)
summary(glm_b1)
# fit another GLM where gear is removed
glm_b2 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1)
summary(glm_b2)
# F test on difference
anova(as.glm(glm_b1), as.glm(glm_b2), test = "F")
我看到错误信息:
Error in qr.lm(object) : lm object does not have a proper 'qr'
component. Rank zero or should not have used lm(.., qr=FALSE)
同样的问题出现在之前的 SO 帖子中: 但似乎没有解决。
有人可以帮忙吗?如果 as.glm()
在这里无济于事,还有其他方法吗?我可以编写一个自定义函数来执行此操作吗(我怀疑我的编码能力已达到极限!)?
此外,SO 是最好的论坛吗,或者其他 StackExchange 论坛之一是否是寻求指导的更好地方?
谢谢。
部分解决方案...
my_anova <- function (model_1, model_2, test_type)
{
# only applies for nested GLMs. How do I test for this?
cat("\n")
if(test_type != "F")
{
cat("Invalid function call")
}
else
{
# display model formulae
cat("Model 1:", format(glm_b1$formula), "\n")
cat("Model 2:", format(glm_b2$formula), "\n")
if(test_type == "F")
{
if (model_1$df[2] < model_2$df[2]) # model 1 is big, model 2 is small
{
dev_s <- model_2$deviance
df_s <- model_2$df[2]
dev_b <- model_1$deviance
df_b <- model_1$df[2]
}
else # model 2 is big, model 1 is small
{
dev_s <- model_1$deviance
df_s <- model_1$df[2]
dev_b <- model_2$deviance
df_b <- model_2$df[2]
}
F <- (dev_s - dev_b) / ((df_s - df_b) * dev_b / df_b)
}
# still need to calculate the F tail probability however
# df of F: numerator: df_s - df_b
# df of F: denominator: df_b
F_test <- pf(F, df_s - df_b, df_b, lower.tail = FALSE)
cat("\n")
cat("F: ", round(F, 4), "\n")
cat("Pr(>F):", round(F_test, 4))
}
}
我构建了很多 GLM。通常在具有许多模型参数的大型数据集上。这意味着基础 R 的 glm()
函数并不是很有用,因为它无法处理 size/complexity,所以我通常使用 revoScaleR::rxGlm()
代替。
但是我希望能够对嵌套模型对进行方差分析测试,但我还没有找到对 rxGlm()
创建的模型对象执行此操作的方法,因为 R 的 anova()
功能不适用于它们。 revoScaleR
提供了一个 as.glm()
函数,可以将 rxGlm()
对象转换为 glm()
对象 - 有点 - 但它在这里不起作用。
例如:
library(dplyr)
data(mtcars)
# don't like having named rows
mtcars <- mtcars %>%
mutate(veh_name = rownames(.)) %>%
select(veh_name, everything())
# fit a GLM: mpg ~ everything else
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a1)
# fit another GLM where gear is removed
glm_a2 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a2)
# F test on difference
anova(glm_a1, glm_a2, test = "F")
工作正常,但如果我这样做:
library(dplyr)
data(mtcars)
# don't like having named rows
mtcars <- mtcars %>%
mutate(veh_name = rownames(.)) %>%
select(veh_name, everything())
glm_b1 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1)
summary(glm_b1)
# fit another GLM where gear is removed
glm_b2 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1)
summary(glm_b2)
# F test on difference
anova(as.glm(glm_b1), as.glm(glm_b2), test = "F")
我看到错误信息:
Error in qr.lm(object) : lm object does not have a proper 'qr'
component. Rank zero or should not have used lm(.., qr=FALSE)
同样的问题出现在之前的 SO 帖子中:
有人可以帮忙吗?如果 as.glm()
在这里无济于事,还有其他方法吗?我可以编写一个自定义函数来执行此操作吗(我怀疑我的编码能力已达到极限!)?
此外,SO 是最好的论坛吗,或者其他 StackExchange 论坛之一是否是寻求指导的更好地方?
谢谢。
部分解决方案...
my_anova <- function (model_1, model_2, test_type)
{
# only applies for nested GLMs. How do I test for this?
cat("\n")
if(test_type != "F")
{
cat("Invalid function call")
}
else
{
# display model formulae
cat("Model 1:", format(glm_b1$formula), "\n")
cat("Model 2:", format(glm_b2$formula), "\n")
if(test_type == "F")
{
if (model_1$df[2] < model_2$df[2]) # model 1 is big, model 2 is small
{
dev_s <- model_2$deviance
df_s <- model_2$df[2]
dev_b <- model_1$deviance
df_b <- model_1$df[2]
}
else # model 2 is big, model 1 is small
{
dev_s <- model_1$deviance
df_s <- model_1$df[2]
dev_b <- model_2$deviance
df_b <- model_2$df[2]
}
F <- (dev_s - dev_b) / ((df_s - df_b) * dev_b / df_b)
}
# still need to calculate the F tail probability however
# df of F: numerator: df_s - df_b
# df of F: denominator: df_b
F_test <- pf(F, df_s - df_b, df_b, lower.tail = FALSE)
cat("\n")
cat("F: ", round(F, 4), "\n")
cat("Pr(>F):", round(F_test, 4))
}
}