dplyr:我如何根据其他列中的值计算组内的倍数变化
dplyr: How do i calculate fold-change within group based on values in other column
我目前的数据大致有以下规律:
Tree Fertilized Region Fruits
apple lightly sunny 100
apple lightly dark 50
apple heavily sunny 300
apple heavily dark 200
pear lightly sunny 150
pear lightly dark 200
pear heavily sunny 300
pear heavily dark 150
这里我想计算(作为更大函数的一部分)在每种施肥量和树木类型的组合(例如 2-轻施肥苹果树的倍数变化):
df%<>%
group_by(Tree,Fertilized) %>%
summarise(!!paste0("fold_change_", quote(Fruits)) := .[Region == "sunny","Fruits"]/.[type == "dark","Fruits"])
但是,我收到一条错误消息,指出“水果”列不存在。有没有人对如何让它工作有建议?我想解决方案是一些小的语法调整,但我似乎无法自己或在线找到它。
实际数据集有更多的树类型和参数,如“Fruits”,因此我选择了管道结构和列的动态标签(“!!paste0()”、“:=”),这可能是相关的或与解决此问题无关。
提前感谢任何试图提供帮助的人!
干杯,罗布
我会使用 group-by 操作:
library(data.table)
library(dplyr)
f <- tempfile()
writeLines("
Tree, Fertilized, Region, Fruits,
apple, lightly, sunny, 100,
apple, lightly, dark, 50,
apple, heavily, sunny, 300,
apple, heavily, dark, 200,
pear, lightly, sunny, 150,
pear, lightly, dark, 200,
pear, heavily, sunny, 300,
pear, heavily, dark, 150
", f)
dat <- read.csv(f)
data.table
dat <- data.table(dat)
dat[order(Region), .(fold_change = Fruits[2] / Fruits[1]), by=.(Tree, Fertilized)]
#> Tree Fertilized fold_change
#> 1: apple lightly 2.00
#> 2: apple heavily 1.50
#> 3: pear lightly 0.75
#> 4: pear heavily 2.00
整洁宇宙
dat %>%
arrange(Region) %>%
group_by(Tree, Fertilized) %>%
summarize(fold_change = Fruits[2] / Fruits[1])
#> `summarise()` regrouping output by 'Tree' (override with `.groups` argument)
#> # A tibble: 4 x 3
#> # Groups: Tree [2]
#> Tree Fertilized fold_change
#> <chr> <chr> <dbl>
#> 1 apple " heavily" 1.5
#> 2 apple " lightly" 2
#> 3 pear " heavily" 2
#> 4 pear " lightly" 0.75
我目前的数据大致有以下规律:
Tree Fertilized Region Fruits
apple lightly sunny 100
apple lightly dark 50
apple heavily sunny 300
apple heavily dark 200
pear lightly sunny 150
pear lightly dark 200
pear heavily sunny 300
pear heavily dark 150
这里我想计算(作为更大函数的一部分)在每种施肥量和树木类型的组合(例如 2-轻施肥苹果树的倍数变化):
df%<>%
group_by(Tree,Fertilized) %>%
summarise(!!paste0("fold_change_", quote(Fruits)) := .[Region == "sunny","Fruits"]/.[type == "dark","Fruits"])
但是,我收到一条错误消息,指出“水果”列不存在。有没有人对如何让它工作有建议?我想解决方案是一些小的语法调整,但我似乎无法自己或在线找到它。
实际数据集有更多的树类型和参数,如“Fruits”,因此我选择了管道结构和列的动态标签(“!!paste0()”、“:=”),这可能是相关的或与解决此问题无关。
提前感谢任何试图提供帮助的人!
干杯,罗布
我会使用 group-by 操作:
library(data.table)
library(dplyr)
f <- tempfile()
writeLines("
Tree, Fertilized, Region, Fruits,
apple, lightly, sunny, 100,
apple, lightly, dark, 50,
apple, heavily, sunny, 300,
apple, heavily, dark, 200,
pear, lightly, sunny, 150,
pear, lightly, dark, 200,
pear, heavily, sunny, 300,
pear, heavily, dark, 150
", f)
dat <- read.csv(f)
data.table
dat <- data.table(dat)
dat[order(Region), .(fold_change = Fruits[2] / Fruits[1]), by=.(Tree, Fertilized)]
#> Tree Fertilized fold_change
#> 1: apple lightly 2.00
#> 2: apple heavily 1.50
#> 3: pear lightly 0.75
#> 4: pear heavily 2.00
整洁宇宙
dat %>%
arrange(Region) %>%
group_by(Tree, Fertilized) %>%
summarize(fold_change = Fruits[2] / Fruits[1])
#> `summarise()` regrouping output by 'Tree' (override with `.groups` argument)
#> # A tibble: 4 x 3
#> # Groups: Tree [2]
#> Tree Fertilized fold_change
#> <chr> <chr> <dbl>
#> 1 apple " heavily" 1.5
#> 2 apple " lightly" 2
#> 3 pear " heavily" 2
#> 4 pear " lightly" 0.75