按多组分层的 R 卡方
R Chi-squared stratified by multiple groups
我有以下带有 3 个因子变量和 1 个百分比变量的 df:
df <- data.frame(
group = rep(c("Case", "Control"), each=16),
timing = rep(c("T0", "T1", "T2", "T3"), each=4, times=2),
food.type = rep (c("Very healthy", "Healthy", "Unhealthy", "Very bad"), times = 8),
intake.percentage = runif(32, min=1, max=25)
)
我如何执行测试(卡方)以评估每种食物的组(病例;对照组)之间每次 (T0-T3) 的统计差异?
方便大家理解的情节:
ggplot(df, aes(x = timing,y = intake.percentage, group=group)) +
geom_line(aes(colour=group)) +
geom_point(aes(colour=group, shape=group), size=3) +
theme_light(16) +
facet_grid(facets = .~food.type, scales = 'free')
对于它的编程部分(Whosebug 用于编程问题,Cross Validated 用于统计问题),假设您对每个场景进行了 100 次试验:
set.seed(1)
df <- data.frame(
group = rep(c("Case", "Control"), each=16),
timing = rep(c("T0", "T1", "T2", "T3"), each=4, times=2),
food.type = rep (c("Very healthy", "Healthy", "Unhealthy", "Very bad"), times = 8),
intake.percentage = runif(32, min=1, max=25)
)
lst <- with(df, split(transform(df, intake.percentage2=100-intake.percentage), list(timing, food.type)))
res <- lapply(lst, function(x) chisq.test(x[, -(1:3)]))
sapply(res, "[", "p.value")
# $T0.Healthy.p.value
# [1] 0.009604491
#
# $T1.Healthy.p.value
# [1] 0.001794137
#
# $T2.Healthy.p.value
# [1] 0.04958723
#
# $T3.Healthy.p.value
# [1] 0.9904441
#
# $T0.Unhealthy.p.value
# [1] 0.4369428
# ...
我有以下带有 3 个因子变量和 1 个百分比变量的 df:
df <- data.frame(
group = rep(c("Case", "Control"), each=16),
timing = rep(c("T0", "T1", "T2", "T3"), each=4, times=2),
food.type = rep (c("Very healthy", "Healthy", "Unhealthy", "Very bad"), times = 8),
intake.percentage = runif(32, min=1, max=25)
)
我如何执行测试(卡方)以评估每种食物的组(病例;对照组)之间每次 (T0-T3) 的统计差异?
方便大家理解的情节:
ggplot(df, aes(x = timing,y = intake.percentage, group=group)) +
geom_line(aes(colour=group)) +
geom_point(aes(colour=group, shape=group), size=3) +
theme_light(16) +
facet_grid(facets = .~food.type, scales = 'free')
对于它的编程部分(Whosebug 用于编程问题,Cross Validated 用于统计问题),假设您对每个场景进行了 100 次试验:
set.seed(1)
df <- data.frame(
group = rep(c("Case", "Control"), each=16),
timing = rep(c("T0", "T1", "T2", "T3"), each=4, times=2),
food.type = rep (c("Very healthy", "Healthy", "Unhealthy", "Very bad"), times = 8),
intake.percentage = runif(32, min=1, max=25)
)
lst <- with(df, split(transform(df, intake.percentage2=100-intake.percentage), list(timing, food.type)))
res <- lapply(lst, function(x) chisq.test(x[, -(1:3)]))
sapply(res, "[", "p.value")
# $T0.Healthy.p.value
# [1] 0.009604491
#
# $T1.Healthy.p.value
# [1] 0.001794137
#
# $T2.Healthy.p.value
# [1] 0.04958723
#
# $T3.Healthy.p.value
# [1] 0.9904441
#
# $T0.Unhealthy.p.value
# [1] 0.4369428
# ...