Limma 使用 makeContrasts 和 eBayes 比较 Bulk RNA Seq
Limma to Compare Bulk RNA Seq using makeContrasts and eBayes
经过一天的谷歌搜索,我决定最好在这里提问。
所以实验是我有来自 3 位患者的大量 RNA 序列数据:A、B、C。
并获得他们的RNA seq数据用于预处理,治疗周期1,治疗周期2,治疗周期3.
所以我总共有 12 个 bulk RNA seq 样本:
A.PreTreat -> A.Cycle1 -> A.Cycle2 -> A.Cycle3
B.PreTreat -> B.Cycle1 -> B.Cycle2 -> B.Cycle3
C.PreTreat -> C.Cycle1 -> C.Cycle2 -> C.Cycle3
我想使用 model.matrix(), lmFit(), makeContrasts(), contrasts.fit(), eBayes()
获得不同周期(即第 3 周期到预处理,第 3 周期到第 2 周期)之间的差异基因列表,所有这些都在 limma 包中。
这是我的最小工作示例。
library(limma)
# Already normalized expression set: rows are genes, columns are the 12 samples
normalized_expression <- matrix(data=sample(1:100), nrow=10, ncol=12)
colnames(normalized_expression) <- c("A.PreTreat", "A.Cycle1", "A.Cycle2", "A.Cycle3", "B.PreTreat", "B.Cycle1", "B.Cycle2", "B.Cycle3", "C.PreTreat", "C.Cycle1", "C.Cycle2", "C.Cycle3")
patient_and_treatment <- factor(colnames(normalized_expression), levels = colnames(normalized_expression))
design.matrix <- model.matrix(~0 + patient_and_treatment)
colnames(design.matrix) <- patient_and_treatment
fit <- lmFit(normalized_expression, design.matrix)
# I want to get a contrast matrix to get differential genes between cycle 3 treatment and pre-treatment in all patients
contrast.matrix <- makeContrasts("A.Cycle3+B.Cycle3+C.Cycle3-A.PreTreat-B.PreTreat-C.PreTreat",
levels = levels(patient_and_treatment))
# Outputs Error of no residual degree of freedom
fit2 <- eBayes( contrasts.fit( fit, contrast.matrix ) )
# Want to run but cannot
summary(decideTests(fit2))
到目前为止,我一直停留在无剩余自由度错误上。
我什至不确定这是否是 limma 中统计上正确的方法来解决我的问题,即在所有患者的第 3 周期治疗与预处理之间获得差异基因列表。
任何帮助将不胜感激。
谢谢!
每组不能有 1 个观察值,这会使回归变得毫无意义,因为您要将每个数据点拟合到自身。
简而言之,您正在寻找的是在所有患者中观察到的共同效果,例如与 PreTreat 等相比的 Cycle3,像这样设置模型:
library(limma)
metadata = data.frame(
Patient=gsub("[.][^ ]*","",colnames(normalized_expression)),
Treatment=gsub("^[A-Z][.]*","",colnames(normalized_expression))
)
Patient Treatment
1 A PreTreat
2 A Cycle1
3 A Cycle2
4 A Cycle3
5 B PreTreat
6 B Cycle1
7 B Cycle2
8 B Cycle3
9 C PreTreat
10 C Cycle1
11 C Cycle2
12 C Cycle3
现在指定模型矩阵,患者项用于说明患者之间起始水平的差异:
design.matrix <- model.matrix(~0 + Treatment+Patient,data=metadata)
fit <- lmFit(normalized_expression, design.matrix)
contrast.matrix <- makeContrasts(TreatmentCycle3-TreatmentPreTreat,
TreatmentCycle1-TreatmentPreTreat,levels=design.matrix)
fit2 = contrasts.fit(fit, contrast.matrix)
fit2 = eBayes(fit2)
您可以检查系数是否满足您的要求:
fit2$coefficients
Contrasts
TreatmentCycle3 - TreatmentPreTreat
[1,] -3.666667
[2,] -13.666667
[3,] 1.666667
[4,] -40.666667
[5,] 12.000000
[6,] -46.000000
[7,] -32.000000
[8,] 4.666667
[9,] 11.333333
[10,] 5.666667
Contrasts
TreatmentCycle1 - TreatmentPreTreat
[1,] -11.33333
[2,] -19.33333
[3,] -27.33333
[4,] -42.33333
[5,] 27.33333
[6,] -32.66667
[7,] -33.00000
[8,] -30.66667
[9,] 46.00000
[10,] 17.33333
经过一天的谷歌搜索,我决定最好在这里提问。
所以实验是我有来自 3 位患者的大量 RNA 序列数据:A、B、C。 并获得他们的RNA seq数据用于预处理,治疗周期1,治疗周期2,治疗周期3.
所以我总共有 12 个 bulk RNA seq 样本:
A.PreTreat -> A.Cycle1 -> A.Cycle2 -> A.Cycle3
B.PreTreat -> B.Cycle1 -> B.Cycle2 -> B.Cycle3
C.PreTreat -> C.Cycle1 -> C.Cycle2 -> C.Cycle3
我想使用 model.matrix(), lmFit(), makeContrasts(), contrasts.fit(), eBayes()
获得不同周期(即第 3 周期到预处理,第 3 周期到第 2 周期)之间的差异基因列表,所有这些都在 limma 包中。
这是我的最小工作示例。
library(limma)
# Already normalized expression set: rows are genes, columns are the 12 samples
normalized_expression <- matrix(data=sample(1:100), nrow=10, ncol=12)
colnames(normalized_expression) <- c("A.PreTreat", "A.Cycle1", "A.Cycle2", "A.Cycle3", "B.PreTreat", "B.Cycle1", "B.Cycle2", "B.Cycle3", "C.PreTreat", "C.Cycle1", "C.Cycle2", "C.Cycle3")
patient_and_treatment <- factor(colnames(normalized_expression), levels = colnames(normalized_expression))
design.matrix <- model.matrix(~0 + patient_and_treatment)
colnames(design.matrix) <- patient_and_treatment
fit <- lmFit(normalized_expression, design.matrix)
# I want to get a contrast matrix to get differential genes between cycle 3 treatment and pre-treatment in all patients
contrast.matrix <- makeContrasts("A.Cycle3+B.Cycle3+C.Cycle3-A.PreTreat-B.PreTreat-C.PreTreat",
levels = levels(patient_and_treatment))
# Outputs Error of no residual degree of freedom
fit2 <- eBayes( contrasts.fit( fit, contrast.matrix ) )
# Want to run but cannot
summary(decideTests(fit2))
到目前为止,我一直停留在无剩余自由度错误上。
我什至不确定这是否是 limma 中统计上正确的方法来解决我的问题,即在所有患者的第 3 周期治疗与预处理之间获得差异基因列表。
任何帮助将不胜感激。
谢谢!
每组不能有 1 个观察值,这会使回归变得毫无意义,因为您要将每个数据点拟合到自身。
简而言之,您正在寻找的是在所有患者中观察到的共同效果,例如与 PreTreat 等相比的 Cycle3,像这样设置模型:
library(limma)
metadata = data.frame(
Patient=gsub("[.][^ ]*","",colnames(normalized_expression)),
Treatment=gsub("^[A-Z][.]*","",colnames(normalized_expression))
)
Patient Treatment
1 A PreTreat
2 A Cycle1
3 A Cycle2
4 A Cycle3
5 B PreTreat
6 B Cycle1
7 B Cycle2
8 B Cycle3
9 C PreTreat
10 C Cycle1
11 C Cycle2
12 C Cycle3
现在指定模型矩阵,患者项用于说明患者之间起始水平的差异:
design.matrix <- model.matrix(~0 + Treatment+Patient,data=metadata)
fit <- lmFit(normalized_expression, design.matrix)
contrast.matrix <- makeContrasts(TreatmentCycle3-TreatmentPreTreat,
TreatmentCycle1-TreatmentPreTreat,levels=design.matrix)
fit2 = contrasts.fit(fit, contrast.matrix)
fit2 = eBayes(fit2)
您可以检查系数是否满足您的要求:
fit2$coefficients
Contrasts
TreatmentCycle3 - TreatmentPreTreat
[1,] -3.666667
[2,] -13.666667
[3,] 1.666667
[4,] -40.666667
[5,] 12.000000
[6,] -46.000000
[7,] -32.000000
[8,] 4.666667
[9,] 11.333333
[10,] 5.666667
Contrasts
TreatmentCycle1 - TreatmentPreTreat
[1,] -11.33333
[2,] -19.33333
[3,] -27.33333
[4,] -42.33333
[5,] 27.33333
[6,] -32.66667
[7,] -33.00000
[8,] -30.66667
[9,] 46.00000
[10,] 17.33333