R - 在按变量 2 分组的变量 1 的水平之间进行比较时变量的最大值
R - maximum value of variables when compared between levels of variable1 grouped by variable2
考虑以下数据
set.seed(123)
example.df <- data.frame(
gene = sample(c("A", "B", "C", "D"), 100, replace = TRUE),
treated = sample(c("Yes", "No"), 100, replace = TRUE),
resp=rnorm(100, 10,5), effect = rnorm (100, 25, 5))
我正在尝试根据 gene 的水平比较所有变量并按 treated 分组时获取所有变量的最大值。我可以像这样创建 gene 组合,
combn(sort(unique(example.df$gene)), 2, simplify = T)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] A A A B B c
#[2,] B c D c D D
#Levels: A B c D
编辑:我正在寻找的输出是这样的数据框
comparison group max.resp max.effect
A-B no value1 value2
....
C-D no valueX valueY
A-B yes value3 value4
....
C-D yes valueXX valueYY
虽然我能够获得每个个体的最大值 gene level 按 treated...
max.df <- example.df %>%
group_by(treated, gene) %>%
nest() %>%
mutate(mod = map(data, ~summarise_if(.x, is.numeric, max, na.rm = TRUE))) %>%
select(treated, gene, mod) %>%
unnest(mod) %>%
arrange(treated, gene)
尽管试图解决这个问题超过一天,但我无法弄清楚如何为每个 2 级 gene 比较(A vs B, A vs C, A vs D, B vs C, B vs D, and C vs D) 按治疗分组。
感谢任何帮助。谢谢。
我找到了一个解决方案,可能有点乱,但我会以更好的方式更新它,它不需要任何时间
library(tidyverse)
首先,我生成了一个包含两列的数据框,Gen1 和 Gen2 用于所有可能的比较,与您对 combn
的使用非常相似,但创建了一个 data.frame
GeneComp <- expand.grid(Gen1 = unique(example.df$gene), Gen2 = unique(example.df$gene)) %>% filter(Gen1 != Gen2) %>% arrange(Gen1)
然后我循环遍历它按
分组
Comps <- list()
for(i in 1:nrow(GeneComp)){
Comps[[i]] <- example.df %>% filter(gene == GeneComp[i,]$Gen1 | gene == GeneComp[i,]$Gen2) %>% # This line filters only the data with genes in the ith row
group_by(treated) %>% # Then gorup by treated
summarise_if(is.numeric, max) %>% # then summarise max if numeric
mutate(Comparison = paste(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2, sep = "-")) # and generate the comparisson variable
}
Comps <- bind_rows(Comps) # and finally join in a data frame
让我知道它是否满足您的所有需求
为了只获取一次数据而添加
重要的是你的基因是字符串而不是因子,所以你可能必须这样做
options(stringsAsFactors = FALSE)
example.df <- data.frame(
gene = c(sample(c("A", "B", "C", "D"), 100, replace = TRUE)),
treated = sample(c("Yes", "No"), 100, replace = TRUE),
resp=rnorm(100, 10,5), effect = rnorm (100, 25, 5))
然后再次在 expand.grid
中添加 stringsAsFactors = F
参数
GeneComp <- expand.grid(Gen1 = unique(example.df$gene), Gen2 = unique(example.df$gene), stringsAsFactors = F) %>% filter(Gen1 != Gen2) %>% arrange(Gen1)
现在允许您在粘贴 Comparison 变量时进入循环以对两个输入进行排序,这样,行将被复制,但是当您使用 distinct
函数,它会让你的数据变成你想要的样子
Comps <- list()
for(i in 1:nrow(GeneComp)){
Comps[[i]] <- example.df %>% filter(gene == GeneComp[i,]$Gen1 | gene == GeneComp[i,]$Gen2) %>% # This line filters only the data with genes in the ith row
group_by(treated) %>% # Then gorup by treated
summarise_if(is.numeric, max) %>% # then summarise max if numeric
mutate(Comparison = paste(sort(c(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2))[1], sort(c(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2))[2], sep = "-")) # and generate the comparisson variable
}
Comps <- bind_rows(Comps) %>% distinct() # and finally join in a data frame
考虑以下数据
set.seed(123)
example.df <- data.frame(
gene = sample(c("A", "B", "C", "D"), 100, replace = TRUE),
treated = sample(c("Yes", "No"), 100, replace = TRUE),
resp=rnorm(100, 10,5), effect = rnorm (100, 25, 5))
我正在尝试根据 gene 的水平比较所有变量并按 treated 分组时获取所有变量的最大值。我可以像这样创建 gene 组合,
combn(sort(unique(example.df$gene)), 2, simplify = T)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] A A A B B c
#[2,] B c D c D D
#Levels: A B c D
编辑:我正在寻找的输出是这样的数据框
comparison group max.resp max.effect
A-B no value1 value2
....
C-D no valueX valueY
A-B yes value3 value4
....
C-D yes valueXX valueYY
虽然我能够获得每个个体的最大值 gene level 按 treated...
max.df <- example.df %>%
group_by(treated, gene) %>%
nest() %>%
mutate(mod = map(data, ~summarise_if(.x, is.numeric, max, na.rm = TRUE))) %>%
select(treated, gene, mod) %>%
unnest(mod) %>%
arrange(treated, gene)
尽管试图解决这个问题超过一天,但我无法弄清楚如何为每个 2 级 gene 比较(A vs B, A vs C, A vs D, B vs C, B vs D, and C vs D) 按治疗分组。
感谢任何帮助。谢谢。
我找到了一个解决方案,可能有点乱,但我会以更好的方式更新它,它不需要任何时间
library(tidyverse)
首先,我生成了一个包含两列的数据框,Gen1 和 Gen2 用于所有可能的比较,与您对 combn
的使用非常相似,但创建了一个 data.frame
GeneComp <- expand.grid(Gen1 = unique(example.df$gene), Gen2 = unique(example.df$gene)) %>% filter(Gen1 != Gen2) %>% arrange(Gen1)
然后我循环遍历它按
分组Comps <- list()
for(i in 1:nrow(GeneComp)){
Comps[[i]] <- example.df %>% filter(gene == GeneComp[i,]$Gen1 | gene == GeneComp[i,]$Gen2) %>% # This line filters only the data with genes in the ith row
group_by(treated) %>% # Then gorup by treated
summarise_if(is.numeric, max) %>% # then summarise max if numeric
mutate(Comparison = paste(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2, sep = "-")) # and generate the comparisson variable
}
Comps <- bind_rows(Comps) # and finally join in a data frame
让我知道它是否满足您的所有需求
为了只获取一次数据而添加
重要的是你的基因是字符串而不是因子,所以你可能必须这样做
options(stringsAsFactors = FALSE)
example.df <- data.frame(
gene = c(sample(c("A", "B", "C", "D"), 100, replace = TRUE)),
treated = sample(c("Yes", "No"), 100, replace = TRUE),
resp=rnorm(100, 10,5), effect = rnorm (100, 25, 5))
然后再次在 expand.grid
中添加 stringsAsFactors = F
参数
GeneComp <- expand.grid(Gen1 = unique(example.df$gene), Gen2 = unique(example.df$gene), stringsAsFactors = F) %>% filter(Gen1 != Gen2) %>% arrange(Gen1)
现在允许您在粘贴 Comparison 变量时进入循环以对两个输入进行排序,这样,行将被复制,但是当您使用 distinct
函数,它会让你的数据变成你想要的样子
Comps <- list()
for(i in 1:nrow(GeneComp)){
Comps[[i]] <- example.df %>% filter(gene == GeneComp[i,]$Gen1 | gene == GeneComp[i,]$Gen2) %>% # This line filters only the data with genes in the ith row
group_by(treated) %>% # Then gorup by treated
summarise_if(is.numeric, max) %>% # then summarise max if numeric
mutate(Comparison = paste(sort(c(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2))[1], sort(c(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2))[2], sep = "-")) # and generate the comparisson variable
}
Comps <- bind_rows(Comps) %>% distinct() # and finally join in a data frame