在扫帚结果中为 glm 系数类别添加一列
Adding an column for the category of glm coeffients in broom results
有什么方法可以在 broom 包的 tidy
函数的结果中添加一列,该函数可以将术语列与 formula
参数中使用的原始名称及其参数相关联data
参数中的列。
例如,如果我 运行 以下我得到:
library(ggplot2)
library(dplyr)
mod <- glm(mpg ~ wt + qsec + as.factor(carb), data = mtcars)
tidy(mod)
# term estimate std.error statistic p.value
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02
# 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01
我要找的是这样的:
# term estimate std.error statistic p.value term_base
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
# 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
如果此新列中的第一行为空,Intercept
或 1
,则不必担心。只需要可以将术语列与传递给公式的原始变量名称相匹配的东西吗?
编辑
如果不依赖于在公式中使用 as.factor
就好了,例如将工作于:
mod <- glm(mpg ~ wt + qsec + carb, data = mtcars %>% mutate(carb = factor(carb)))
tidy(mod)
# term estimate std.error statistic p.value
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02
# 4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01
# 5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01
我们可以使用正则表达式创建 'term_base' 列
tidy(mod) %>%
mutate(term_base = sub("Intercept", "", gsub(".*\(|\).*", "", term)))
# term estimate std.error statistic p.value term_base
#1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
#2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
#3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
#4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
#5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
#6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
#7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
#8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
如果我们在 glm
步骤之前 mutate
'carb' 到 factor
,那么 as.factor
也可以从 'term' 中删除
mtcars %>%
mutate(carb = factor(carb)) %>%
glm(formula = mpg ~wt + qsec + carb, data = .) %>%
tidy(.) %>%
mutate(term_base = sub("\(.*\)|\d+", "", term))
# term estimate std.error statistic p.value term_base
#1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
#2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
#3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
#4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
#5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
#6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
#7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
#8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
有什么方法可以在 broom 包的 tidy
函数的结果中添加一列,该函数可以将术语列与 formula
参数中使用的原始名称及其参数相关联data
参数中的列。
例如,如果我 运行 以下我得到:
library(ggplot2)
library(dplyr)
mod <- glm(mpg ~ wt + qsec + as.factor(carb), data = mtcars)
tidy(mod)
# term estimate std.error statistic p.value
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02
# 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01
我要找的是这样的:
# term estimate std.error statistic p.value term_base
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
# 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
如果此新列中的第一行为空,Intercept
或 1
,则不必担心。只需要可以将术语列与传递给公式的原始变量名称相匹配的东西吗?
编辑
如果不依赖于在公式中使用 as.factor
就好了,例如将工作于:
mod <- glm(mpg ~ wt + qsec + carb, data = mtcars %>% mutate(carb = factor(carb)))
tidy(mod)
# term estimate std.error statistic p.value
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02
# 4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01
# 5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01
我们可以使用正则表达式创建 'term_base' 列
tidy(mod) %>%
mutate(term_base = sub("Intercept", "", gsub(".*\(|\).*", "", term)))
# term estimate std.error statistic p.value term_base
#1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
#2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
#3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
#4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
#5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
#6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
#7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
#8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
如果我们在 glm
步骤之前 mutate
'carb' 到 factor
,那么 as.factor
也可以从 'term' 中删除
mtcars %>%
mutate(carb = factor(carb)) %>%
glm(formula = mpg ~wt + qsec + carb, data = .) %>%
tidy(.) %>%
mutate(term_base = sub("\(.*\)|\d+", "", term))
# term estimate std.error statistic p.value term_base
#1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
#2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
#3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
#4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
#5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
#6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
#7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
#8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb