同时在多个数据集的循环中进行 Wilcoxon 测试
Wilcoxon Test in a loop for a number of datasets at the same time
我有一个问题,是否可以对所有生成的 table 进行循环 Wilcoxon 检验。
基本上,我想在每个数据集的 2 个变量之间进行配对 Wilcoxon 检验,并且这 2 个变量在每个数据集的相同位置(如第 x 列和第 y 列)。 (对于熟悉生物学的人来说,实际上这是一些重复元素的控制样本和处理样本之间的 RPKM 值)我希望我可以为每个 Wilcoxon 检验的 p 值生成一个 table数据集。
我已准备好使用以下代码生成所有 tables/dataset/dataframe,我想我想为每个数据集做一个 Wilcoxon 测试,所以我想我需要继续循环,但我不知道如何这样做:
data=sample_vs_norm
filter=unique(data$family)
for(i in 1:length(filter)){
table_name=paste('table_', filter[i], sep="")
print(table_name)
assign(table_name, data[data$Subfamily == filter[i]])
这是单个数据集的结构:
所以基本上我想在变量“R009_initial_filter_rpkm”和“normal_filter_rpkm”
之间做一个 Wilcoxon 检验
Chr Start End Mappability Strand R009_initial_filter_NormalizedCounts
1: chr11 113086868 113087173 1 - 2
2: chr2 24290845 24291132 1 - 11
3: chr4 15854425 15854650 1 - 0
4: chr6 43489623 43489676 1 + 11
normal_filter_NormalizedCounts R009_initial_filter_rpkm
1: 14.569000 0.169752
2: 1.000000 0.992191
3: 14.815900 0.000000
4: 0.864262 5.372810
normal_filter_rpkm FoldChange p.value FDR FoldChangeFPKM
1: 1.236560 0.137278 0.999862671 1.000000000 0.1372776
2: 0.000000 11.000000 0.003173828 0.008149271 Inf
3: 1.704630 0.000000 1.000000000 1.000000000 0.0000000
4: 0.422137 12.727600 0.003173828 0.008149271 12.7276453
structure(list(Chr = structure(1:4, .Label = c("chr11", "chr2",
"chr4", "chr6"), class = "factor"), Start = c(113086868L, 24290845L,
15854425L, 43489623L), End = c(113087173L, 24291132L, 15854650L,
43489676L), Mappability = c(1L, 1L, 1L, 1L), Strand = structure(c(1L,
1L, 1L, 2L), .Label = c("-", "+"), class = "factor"), R009_initial_filter_NormalizedCounts = c(2L,
11L, 0L, 11L), normal_filter_NormalizedCounts = c(14.569,
1, 14.8159, 0.864262), R009_initial_filter_rpkm = c(0.169752,
0.992191, 0, 5.37281), normal_filter_rpkm = c(1.23656,
0, 1.70463, 0.422137), FoldChange = c(0.137278, 11, 0, 12.7276
), p.value = c(0.999862671, 0.003173828, 1, 0.003173828), FDR = c(1,
0.008149271, 1, 0.008149271), FoldChangeFPKM = c(0.1372776, Inf,
0, 12.7276453), class = "data.frame", row.names = c(NA,
-4L))
如果我使用了不正确的术语,我很抱歉,因为我是 R 的新手,非常感谢您的帮助
一种方法是在 data.table
中使用 by =
分组。
library(data.table)
setDT(data)
data[,wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily]
# TE_Subfamily statistic p.value
#1: AluYf4 7.5 1
您可以按任意数量的变量进行分组,例如 TE_Subfamily
和 Chr
:
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = .(TE_Subfamily,Chr)]
# TE_Subfamily Chr statistic p.value
#1: AluYf4 chr11 0 1
#2: AluYf4 chr2 1 1
#3: AluYf4 chr4 0 1
#4: AluYf4 chr6 1 1
如果您只需要对某些 TE_Subfamily
进行比较,您可以尝试这样的操作:
filter <- c("AluYf4")
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily]
# TE_Subfamily statistic p.value
#1: AluYf4 7.5 1
如需加分,可修正多次检测:
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily][,adjusted.p.value := p.adjust(p.value,method = "bonferroni")][]
我有一个问题,是否可以对所有生成的 table 进行循环 Wilcoxon 检验。
基本上,我想在每个数据集的 2 个变量之间进行配对 Wilcoxon 检验,并且这 2 个变量在每个数据集的相同位置(如第 x 列和第 y 列)。 (对于熟悉生物学的人来说,实际上这是一些重复元素的控制样本和处理样本之间的 RPKM 值)我希望我可以为每个 Wilcoxon 检验的 p 值生成一个 table数据集。
我已准备好使用以下代码生成所有 tables/dataset/dataframe,我想我想为每个数据集做一个 Wilcoxon 测试,所以我想我需要继续循环,但我不知道如何这样做:
data=sample_vs_norm
filter=unique(data$family)
for(i in 1:length(filter)){
table_name=paste('table_', filter[i], sep="")
print(table_name)
assign(table_name, data[data$Subfamily == filter[i]])
这是单个数据集的结构: 所以基本上我想在变量“R009_initial_filter_rpkm”和“normal_filter_rpkm”
之间做一个 Wilcoxon 检验 Chr Start End Mappability Strand R009_initial_filter_NormalizedCounts
1: chr11 113086868 113087173 1 - 2
2: chr2 24290845 24291132 1 - 11
3: chr4 15854425 15854650 1 - 0
4: chr6 43489623 43489676 1 + 11
normal_filter_NormalizedCounts R009_initial_filter_rpkm
1: 14.569000 0.169752
2: 1.000000 0.992191
3: 14.815900 0.000000
4: 0.864262 5.372810
normal_filter_rpkm FoldChange p.value FDR FoldChangeFPKM
1: 1.236560 0.137278 0.999862671 1.000000000 0.1372776
2: 0.000000 11.000000 0.003173828 0.008149271 Inf
3: 1.704630 0.000000 1.000000000 1.000000000 0.0000000
4: 0.422137 12.727600 0.003173828 0.008149271 12.7276453
structure(list(Chr = structure(1:4, .Label = c("chr11", "chr2",
"chr4", "chr6"), class = "factor"), Start = c(113086868L, 24290845L,
15854425L, 43489623L), End = c(113087173L, 24291132L, 15854650L,
43489676L), Mappability = c(1L, 1L, 1L, 1L), Strand = structure(c(1L,
1L, 1L, 2L), .Label = c("-", "+"), class = "factor"), R009_initial_filter_NormalizedCounts = c(2L,
11L, 0L, 11L), normal_filter_NormalizedCounts = c(14.569,
1, 14.8159, 0.864262), R009_initial_filter_rpkm = c(0.169752,
0.992191, 0, 5.37281), normal_filter_rpkm = c(1.23656,
0, 1.70463, 0.422137), FoldChange = c(0.137278, 11, 0, 12.7276
), p.value = c(0.999862671, 0.003173828, 1, 0.003173828), FDR = c(1,
0.008149271, 1, 0.008149271), FoldChangeFPKM = c(0.1372776, Inf,
0, 12.7276453), class = "data.frame", row.names = c(NA,
-4L))
如果我使用了不正确的术语,我很抱歉,因为我是 R 的新手,非常感谢您的帮助
一种方法是在 data.table
中使用 by =
分组。
library(data.table)
setDT(data)
data[,wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily]
# TE_Subfamily statistic p.value
#1: AluYf4 7.5 1
您可以按任意数量的变量进行分组,例如 TE_Subfamily
和 Chr
:
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = .(TE_Subfamily,Chr)]
# TE_Subfamily Chr statistic p.value
#1: AluYf4 chr11 0 1
#2: AluYf4 chr2 1 1
#3: AluYf4 chr4 0 1
#4: AluYf4 chr6 1 1
如果您只需要对某些 TE_Subfamily
进行比较,您可以尝试这样的操作:
filter <- c("AluYf4")
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily]
# TE_Subfamily statistic p.value
#1: AluYf4 7.5 1
如需加分,可修正多次检测:
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily][,adjusted.p.value := p.adjust(p.value,method = "bonferroni")][]