绘制斜率和截距的回归（格子或 ggplot2）

Question

我有一个微阵列数据集，我对其执行了 limma lmFit() 测试。如果您以前没有听说过它，它是一个强大的线性模型包，可以测试 >20k 基因的差异基因表达。您可以为这些基因中的每一个从模型中提取斜率和截距。

我的问题是：给定 table 的斜率和截距值，我如何匹配绘图（我也不介意 ggplot2 的 geom_abline、lattice 的 panel.abline，或必要时的替代方案）及其相应的斜率和截距？

我的 table（称之为 "slopeInt"）的第 1 列为截距，第 2 列为斜率，行名与基因名称相对应。他们的名字是这样的：

"202586_at"   "202769_at"   "203201_at"   "214970_s_at" "219155_at"

这些名称与我在另一个 table ("Data") 中的基因名称相匹配，其中包含有关我的样本的一些详细信息（我有 24 个具有不同 ID 和 Time/Treatment 组合的样本）和基因表达式值。

它是长格式，基因名称（如上）每 24 行重复一次（同一基因的不同表达水平，对于我的每个样本）：

ID    Time  Treatment  Gene_name    Gene_exp
...   ...   ...        ...          ...

我总共有八个基因有兴趣绘制，我的 Data$Gene_name 中的名称与我的 slopeInt table 的行名称相匹配。我还可以将两个 table 合并在一起，这不是问题。但是我尝试了以下两种方法，通过适当的回归为我的每个基因提供图表，但无济于事：

使用ggplot2:

ggplot(Data, aes(x = Time, y = Gene_exp, group = Time, color = Treatment)) +
facet_wrap(~ Gene_name, scales = "free_x") +
geom_point() +
geom_abline(intercept = Intercept, slope = Time), data = slopeInt) +
theme(panel.grid.major.y = element_blank())`

并且还使用 Lattice:

xyplot(Gene_exp ~ Time| Gene_name, Data, 
   jitter.data = T,
   panel = function(...){
     panel.xyplot(...)
     panel.abline(a = slopeInt[,1], b = slopeInt[,2])},
   layout = c(4, 2))

我在实际的 geom_abline() 和 panel.abline() 参数中尝试了多种其他方法，包括一些 for 循环，但我没有 R 经验，我无法让它工作..我也可以有宽格式的数据文件（每个基因的单独列）。

任何帮助和进一步的指导将不胜感激！！！

下面是一些可重现示例的代码：

Data <- data.frame(
 ID = rep(1:24, 8),
 Time = (rep(rep(c(1, 2, 4, 24), each = 3), 8)),
 Treatment = rep(rep(c("control", "smoking"), each = 12), 8),
 Gene_name = rep(c("202586_at", "202769_at", "203201_at", "214970_s_at",
   "219155_at", "220165_at", "224483_s_at", "227559_at"), each = 24),
 Gene_exp = rnorm(192))

slopeInt <- data.frame(
 Intercept = rnorm(8),
 Slope = rnorm(8))


row.names(slopeInt) <- c("202586_at", "202769_at", "203201_at",
"214970_s_at", "219155_at", "220165_at", "224483_s_at", "227559_at")

Answer 1

有了格子，这应该可以工作

xyplot(Gene_exp ~ Time| Gene_name, Data, slopeInt=slopeInt,
   jitter.data = T,
   panel = function(..., slopeInt){
     panel.xyplot(...)
     grp <- trellis.last.object()$condlevels[[1]][which.packet()]
     panel.abline(a = slopeInt[grp,1], b = slopeInt[grp,2])
   },
   layout = c(4, 2)
)

在生成样本数据之前使用 set.seed(15) 得到以下图表

这里的"trick"是用trellis.last.object()$condlevels来判断我们当前在哪个conditioning block。然后我们用这个信息从我们现在传入的附加数据中提取正确的斜率信息通过一个参数。我认为有一种更优雅的方法来确定条件变量的当前值，但如果有的话我现在想不起来了。

Answer 2

如果您将 Gene_name 指定为 slopeInt 中的列，那么它将起作用 [据我了解您希望如此]。还要注意对 ggplot 调用的一些其他更改。

slopeInt$Gene_name <- rownames(slopeInt)
ggplot(Data, aes(x = Time, y = Gene_exp, color = Treatment)) +
   facet_wrap(~ Gene_name, scales = "free_x") +
   geom_point() +
   geom_abline(aes(intercept = Intercept, slope = Slope), data = slopeInt) +
   theme(panel.grid.major.y = element_blank())

绘制斜率和截距的回归（格子或 ggplot2）

Plotting regressions from slope and intercept (lattice or ggplot2)

plot

r

ggplot2

lattice