R 中一个因子的连续水平之间的对比
Contrasts between successive levels of a factor in R
我写这篇文章 post 是因为我一直在分析来自实验室实验的数据文件。
在这个实验中,我统计了整个 26 个时间点 (TP) 中特定环境中的雌性(小型节肢动物)的数量。但是,我想了解每个连续时间点之间的女性数量是否不同(例如,如果 TP 1 中计算的女性数量与 TP 2 不同;TP 2 中计算的女性数量与 TP 3 不同;和依此类推...)
数据框有以下列:
Replicate(包含复制的数量,从 1 到 8);时间点(女性被计数的那一天,从 1 到 26);雌性(每个时间点统计的雌性数量);和块(实验有 2 个块)
我试过做一些连续的对比,但我认为这不是最好的方法。这是我的代码:
m1<-lmer(Females~TimePoint+(1|Block))
Suc_contrasts2<-glht(m1,linfct=mcp(TimePoint=
c(
"t1 - t2 == 0",
"t2 - t3 == 0",
"t3 - t4 == 0",
"t4 - t5 == 0",
"t5 - t6 == 0",
"t6 - t7 == 0",
"t7 - t8 == 0",
"t8 - t9 == 0",
"t9 - t10 == 0",
"t10 - t11== 0",
"t11 - t12 == 0",
"t12 - t13 == 0",
"t13 - t14 == 0",
"t14 - t15 == 0",
"t15 - t16 == 0",
"t16 - t17 == 0",
"t17 - t18 == 0",
"t18 - t19 == 0",
"t19 - t20 == 0",
"t20 - t21== 0",
"t21 - t22 == 0",
"t22 - t23 == 0",
"t23 - t24 == 0",
"t24 - t25 == 0",
"t25 - t26 == 0")))
summary(Suc_contrasts2)
summary(Suc_contrasts2, test=adjusted ("bonferroni"))
我一直在寻找 google 进行计划比较的其他方法,但我发现的所有方法都不太适合我的数据集。我在这方面还是新手,对于任何新手错误,我深表歉意。
因此我的问题是,有没有更好的方法来比较我在每对连续时间点之间发现的女性数量?
编辑 1:
我也试过这样对比,结果好像不太对..
levels(TimePoint)
# [1] "t1" "t10" "t11" "t12" "t13" "t14" "t15" "t16" "t17" "t18" "t19" "t2" "t20" "t21" "t22" "t23" "t24" "t25" "t26"
# [20] "t3" "t4" "t5" "t6" "t7" "t8" "t9"
# tell R which TimePoints to compare
c1<- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #1v2
c2<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0) #2v3
c3<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0) #3v4
c4<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0) #4v5
c5<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0) #5v6
c5<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0) #6v7
c6<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0) #7v8
c7<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1) #8v9
c8<- c(0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) #9v10
c9<- c(0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #10v11
c10<- c(0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #11v12
c11<- c(0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #11v12
c12<- c(0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #12v13
c13<- c(0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #13v14
c14<- c(0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #14v15
c15<- c(0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #15v16
c16<- c(0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #16v17
c17<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #17v18
c18<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #18v19
c19<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #19v20
c20<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #20v21
c21<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #21v22
c22<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #22v23
c23<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0) #23v24
c24<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0) #24v25
c25<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0) #25v26
# combined the above lines into a matrix
mat <- cbind(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25)
# tell R that the matrix gives the contrasts you want
contrasts(TimePoint) <- mat
model2 <- aov(Females ~ TimePoint)
summary(model2)
# Df Sum Sq Mean Sq F value Pr(>F)
# line2$TimePoint 25 9694 387.8 6.939 <2e-16 ***
# Residuals 390 21794 55.9
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.aov(model2, split=list(TimePoint=list("1v2"=1, "2v3" = 2, "3v4"=3, "4v5"=4, "5v6"=5, "6v7"=6, "7v8"=7, "8v9"=8, "9v10"=9, "10v11"=10, "11v12"=11, "12v13"=12, "13v14"=13, "14v15"=14, "15v16"=15, "16v17"=16, "17v18"=17, "18v19"=18, "19v20"=19, "20v21"=20, "21v22"=21, "22v23"=22, "23v24"=23, "24v25"=24, "25v26"=25)))
感谢您的宝贵时间,
安德烈
我觉得这个网站对你有帮助:backward difference coding
根据那里的信息,可以像这样设置后续因子水平之间的差异对比(见下文)。请注意,我只使用了一个具有 5 个因子水平的简单示例。
#create dummy data
df <- expand.grid(TimePoint = c("t01", "t02", "t03", "t04", "t05"),
Replicate = 1:8, Block = 1:2)
set.seed(2)
df$Females <- runif(nrow(df), min = 0, max = 100)
#set backward difference contrasts
contrasts(df$TimePoint) <- matrix(c(-4/5, 1/5, 1/5, 1/5, 1/5,
-3/5, -3/5, 2/5, 2/5, 2/5,
-2/5, -2/5, -2/5, 3/5, 3/5,
-1/5, -1/5, -1/5, -1/5, 4/5), ncol = 4)
拟合简单线性模型时,参数估计值对应期望值,即对比“时间点1”对应t2 - t1,对比“时间点2”对应t3 - t2等。
#fit linear model
m1 <- lm(Females ~ TimePoint, data = df)
coef(m1)
(Intercept) TimePoint1 TimePoint2 TimePoint3 TimePoint4
50.295659 -10.116045 7.979465 -10.182389 2.209413
#mean by time point
with(df, tapply(Females, TimePoint, mean))
t01 t02 t03 t04 t05
57.23189 47.11584 55.09531 44.91292 47.12233
我想补充一点,我不确定你正在尝试做的事情是否明智。但我觉得对此进行评估并不方便,这将是 CrossValidated 的一个问题。我担心将 26 个时间点视为分类因子水平并不是最好的方法。此外,在您的初始代码中,您似乎适合将块视为随机因素的模型。如果块只有 2 个级别(如您所写),这没有意义,请参见此处的示例:Link
最后,我注意到在您的示例中,TimePoint 变量的因子水平排序不正确(t1、t10、t11... 而不是 t1、t2、t3...)。例如,您可以使用以下代码行更改它:
df$TimePoint <- factor(df$TimePoint, levels = paste0("t", 1:26),
labels = paste0("t", sprintf("%02d", 1:26)))
另一种拟合方式 successive-differences 对比:
m1 <- lmer(Females~TimePoint+(1|Block), contrasts=list(TimePoint=MASS::contr.sdif))
这没有考虑测试的多样性(你可能会逃避,因为这些是 pre-planned 对比):你可以在 p-values 上使用 p.adjust()
。
@AndreasM 关于因子排序、随机效应与固定效应的选择等的观点,绝对应予以考虑。
我写这篇文章 post 是因为我一直在分析来自实验室实验的数据文件。
在这个实验中,我统计了整个 26 个时间点 (TP) 中特定环境中的雌性(小型节肢动物)的数量。但是,我想了解每个连续时间点之间的女性数量是否不同(例如,如果 TP 1 中计算的女性数量与 TP 2 不同;TP 2 中计算的女性数量与 TP 3 不同;和依此类推...)
数据框有以下列:
Replicate(包含复制的数量,从 1 到 8);时间点(女性被计数的那一天,从 1 到 26);雌性(每个时间点统计的雌性数量);和块(实验有 2 个块)
我试过做一些连续的对比,但我认为这不是最好的方法。这是我的代码:
m1<-lmer(Females~TimePoint+(1|Block))
Suc_contrasts2<-glht(m1,linfct=mcp(TimePoint=
c(
"t1 - t2 == 0",
"t2 - t3 == 0",
"t3 - t4 == 0",
"t4 - t5 == 0",
"t5 - t6 == 0",
"t6 - t7 == 0",
"t7 - t8 == 0",
"t8 - t9 == 0",
"t9 - t10 == 0",
"t10 - t11== 0",
"t11 - t12 == 0",
"t12 - t13 == 0",
"t13 - t14 == 0",
"t14 - t15 == 0",
"t15 - t16 == 0",
"t16 - t17 == 0",
"t17 - t18 == 0",
"t18 - t19 == 0",
"t19 - t20 == 0",
"t20 - t21== 0",
"t21 - t22 == 0",
"t22 - t23 == 0",
"t23 - t24 == 0",
"t24 - t25 == 0",
"t25 - t26 == 0")))
summary(Suc_contrasts2)
summary(Suc_contrasts2, test=adjusted ("bonferroni"))
我一直在寻找 google 进行计划比较的其他方法,但我发现的所有方法都不太适合我的数据集。我在这方面还是新手,对于任何新手错误,我深表歉意。 因此我的问题是,有没有更好的方法来比较我在每对连续时间点之间发现的女性数量?
编辑 1:
我也试过这样对比,结果好像不太对..
levels(TimePoint)
# [1] "t1" "t10" "t11" "t12" "t13" "t14" "t15" "t16" "t17" "t18" "t19" "t2" "t20" "t21" "t22" "t23" "t24" "t25" "t26"
# [20] "t3" "t4" "t5" "t6" "t7" "t8" "t9"
# tell R which TimePoints to compare
c1<- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #1v2
c2<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0) #2v3
c3<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0) #3v4
c4<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0) #4v5
c5<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0) #5v6
c5<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0) #6v7
c6<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0) #7v8
c7<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1) #8v9
c8<- c(0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) #9v10
c9<- c(0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #10v11
c10<- c(0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #11v12
c11<- c(0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #11v12
c12<- c(0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #12v13
c13<- c(0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #13v14
c14<- c(0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #14v15
c15<- c(0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #15v16
c16<- c(0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #16v17
c17<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #17v18
c18<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #18v19
c19<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #19v20
c20<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #20v21
c21<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #21v22
c22<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) #22v23
c23<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0) #23v24
c24<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0) #24v25
c25<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0) #25v26
# combined the above lines into a matrix
mat <- cbind(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25)
# tell R that the matrix gives the contrasts you want
contrasts(TimePoint) <- mat
model2 <- aov(Females ~ TimePoint)
summary(model2)
# Df Sum Sq Mean Sq F value Pr(>F)
# line2$TimePoint 25 9694 387.8 6.939 <2e-16 ***
# Residuals 390 21794 55.9
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.aov(model2, split=list(TimePoint=list("1v2"=1, "2v3" = 2, "3v4"=3, "4v5"=4, "5v6"=5, "6v7"=6, "7v8"=7, "8v9"=8, "9v10"=9, "10v11"=10, "11v12"=11, "12v13"=12, "13v14"=13, "14v15"=14, "15v16"=15, "16v17"=16, "17v18"=17, "18v19"=18, "19v20"=19, "20v21"=20, "21v22"=21, "22v23"=22, "23v24"=23, "24v25"=24, "25v26"=25)))
感谢您的宝贵时间, 安德烈
我觉得这个网站对你有帮助:backward difference coding
根据那里的信息,可以像这样设置后续因子水平之间的差异对比(见下文)。请注意,我只使用了一个具有 5 个因子水平的简单示例。
#create dummy data
df <- expand.grid(TimePoint = c("t01", "t02", "t03", "t04", "t05"),
Replicate = 1:8, Block = 1:2)
set.seed(2)
df$Females <- runif(nrow(df), min = 0, max = 100)
#set backward difference contrasts
contrasts(df$TimePoint) <- matrix(c(-4/5, 1/5, 1/5, 1/5, 1/5,
-3/5, -3/5, 2/5, 2/5, 2/5,
-2/5, -2/5, -2/5, 3/5, 3/5,
-1/5, -1/5, -1/5, -1/5, 4/5), ncol = 4)
拟合简单线性模型时,参数估计值对应期望值,即对比“时间点1”对应t2 - t1,对比“时间点2”对应t3 - t2等。
#fit linear model
m1 <- lm(Females ~ TimePoint, data = df)
coef(m1)
(Intercept) TimePoint1 TimePoint2 TimePoint3 TimePoint4
50.295659 -10.116045 7.979465 -10.182389 2.209413
#mean by time point
with(df, tapply(Females, TimePoint, mean))
t01 t02 t03 t04 t05
57.23189 47.11584 55.09531 44.91292 47.12233
我想补充一点,我不确定你正在尝试做的事情是否明智。但我觉得对此进行评估并不方便,这将是 CrossValidated 的一个问题。我担心将 26 个时间点视为分类因子水平并不是最好的方法。此外,在您的初始代码中,您似乎适合将块视为随机因素的模型。如果块只有 2 个级别(如您所写),这没有意义,请参见此处的示例:Link
最后,我注意到在您的示例中,TimePoint 变量的因子水平排序不正确(t1、t10、t11... 而不是 t1、t2、t3...)。例如,您可以使用以下代码行更改它:
df$TimePoint <- factor(df$TimePoint, levels = paste0("t", 1:26),
labels = paste0("t", sprintf("%02d", 1:26)))
另一种拟合方式 successive-differences 对比:
m1 <- lmer(Females~TimePoint+(1|Block), contrasts=list(TimePoint=MASS::contr.sdif))
这没有考虑测试的多样性(你可能会逃避,因为这些是 pre-planned 对比):你可以在 p-values 上使用 p.adjust()
。
@AndreasM 关于因子排序、随机效应与固定效应的选择等的观点,绝对应予以考虑。