遍历列以从 DESeq2 数据生成 PCA
Loop through columns to generate PCA from DESeq2 data
我想生成我的批量 RNAseq 数据的 PCA,由我在 DESeq2 对象“vsd”中的每个变量着色。我当前的代码如下所示(生成单个图):
pcaData <- plotPCA(vsd, intgroup=c("Age", "BlastRate"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=Age, shape=BlastRate)) +
geom_point(size=3) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
geom_text(aes(label=name),hjust=-.2, vjust=0) +
ggtitle("Principal Component Analysis")
谁能建议一种循环遍历并将“年龄”与 vsd 的其他变量列交换的方法?
>head(colData(vsd),1)
DataFrame with 1 row and 14 columns
LibSize LibDiversity PercMapped Age SpermStatus SpConc SpMot Subject.Group PairedSample FertRate
<factor> <factor> <factor> <factor> <character> <factor> <factor> <factor> <factor> <factor>
sRNA_1 Low High High 42-46 unk unk unk Male-Male 2 Low
BlastRate RNABatch LibPrepBatch sizeFactor
<factor> <factor> <factor> <numeric>
sRNA_1 Low 1 6 0.929408
让我们模拟一些数据,因为您提供的示例太短,ill-formatted无法使用。我假设您的数据结构大致如下:
library(ggplot2)
library(DESeq2)
dds <- makeExampleDESeqDataSet()
colData(dds) <- cbind(colData(dds), age = runif(ncol(dds), max = 50))
dds <- DESeq(dds)
vsd <- vst(dds, nsub = 200) # 200 is for example purposes
pcaData <- plotPCA(vsd, returnData = TRUE)
接下来,我们可以 select 您想要说明的 column-names 作为字符串向量并循环遍历它们。在使用 tidyverse-style non-standard 评估时,您可以使用 .data
代词对特定列进行子集化。
vars <- tail(colnames(pcaData), 2)
plot_list <- lapply(vars, function(myvar) {
ggplot(pcaData, aes(PC1, PC2, colour = .data[[myvar]])) +
geom_point()
})
# Just to show that there are multiple plots
patchwork::wrap_plots(plot_list)
由 reprex package (v2.0.1)
于 2022-03-09 创建
我想生成我的批量 RNAseq 数据的 PCA,由我在 DESeq2 对象“vsd”中的每个变量着色。我当前的代码如下所示(生成单个图):
pcaData <- plotPCA(vsd, intgroup=c("Age", "BlastRate"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=Age, shape=BlastRate)) +
geom_point(size=3) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
geom_text(aes(label=name),hjust=-.2, vjust=0) +
ggtitle("Principal Component Analysis")
>head(colData(vsd),1)
DataFrame with 1 row and 14 columns
LibSize LibDiversity PercMapped Age SpermStatus SpConc SpMot Subject.Group PairedSample FertRate
<factor> <factor> <factor> <factor> <character> <factor> <factor> <factor> <factor> <factor>
sRNA_1 Low High High 42-46 unk unk unk Male-Male 2 Low
BlastRate RNABatch LibPrepBatch sizeFactor
<factor> <factor> <factor> <numeric>
sRNA_1 Low 1 6 0.929408
让我们模拟一些数据,因为您提供的示例太短,ill-formatted无法使用。我假设您的数据结构大致如下:
library(ggplot2)
library(DESeq2)
dds <- makeExampleDESeqDataSet()
colData(dds) <- cbind(colData(dds), age = runif(ncol(dds), max = 50))
dds <- DESeq(dds)
vsd <- vst(dds, nsub = 200) # 200 is for example purposes
pcaData <- plotPCA(vsd, returnData = TRUE)
接下来,我们可以 select 您想要说明的 column-names 作为字符串向量并循环遍历它们。在使用 tidyverse-style non-standard 评估时,您可以使用 .data
代词对特定列进行子集化。
vars <- tail(colnames(pcaData), 2)
plot_list <- lapply(vars, function(myvar) {
ggplot(pcaData, aes(PC1, PC2, colour = .data[[myvar]])) +
geom_point()
})
# Just to show that there are multiple plots
patchwork::wrap_plots(plot_list)
由 reprex package (v2.0.1)
于 2022-03-09 创建