如何使用 dplyr 获取具有多个样本的站点的物种丰富度和丰度
How to obtain species richness and abundance for sites with multiple samples using dplyr
问题:
我有多个站点,每个站点有 10 个采样点。
Site Time Sample Species1 Species2 Species3 etc
Home A 1 1 0 4 ...
Home A 2 0 0 2 ...
Work A 1 0 1 1 ...
Work A 2 1 0 1 ...
Home B 1 1 0 4 ...
Home B 2 0 0 2 ...
Work B 1 0 1 1 ...
Work B 2 1 0 1 ...
...
我想获得每个站点的丰富性和丰富性。丰富度是一个地点的物种总数,丰度是一个地点所有物种的所有个体的总数,像这样:
Site Time Richness Abundance
Home A 2 7
Work A 3 4
Home B 2 7
Work B 3 4
我可以通过两个函数(如下)到达那里。但是,我希望两者都在一个 dplyr 函数中。范围 7:34
指的是我的物种矩阵(每行一个 site/sample,物种作为列)。
df1 <- df %>% mutate(Abundance = rowSums(.[,4:30])) %>%
group_by(Site,Time) %>%
summarise_all(sum)
df1$Richness <- apply(df1[,4:30]>0, 1, sum)
如果我尝试在一个函数中执行这两项操作,我会收到以下错误
df1 <- df %>% mutate(Abundance = rowSums(.[,4:30]) ) %>%
group_by(Site, Time) %>%
summarise_all(sum) %>%
mutate(Richness = apply(.[,4:30]>0, 1, sum))
Error in mutate_impl(.data, dots) :
Column `Richness` must be length 5 (the group size) or one, not 19
Richness 部分必须在 summarize 函数之后,因为它必须对汇总和分组数据进行操作。
如何使用此功能?
(注意:这之前被标记为这个问题的重复:
Manipulating seperated species quantity data into a species abundance matrix
然而,这是一个完全不同的问题 - 该问题本质上是关于转置数据集并在单个 species/column 内求和。这是关于跨列(多列)对 all 物种求和。
此外,我实际上认为这个问题的答案非常有帮助 - 像我这样的生态学家一直在计算丰富度和丰度,我相信他们会喜欢一个专门的问题。)
summarise
之后,我们还需要ungroup
library(tidyverse)
df %>%
mutate(Abundance = rowSums(.[4:ncol(.)])) %>%
group_by(Site, Time) %>%
summarise_all(sum) %>%
ungroup %>%
mutate(Richness = apply(.[4:(ncol(.)-1)] > 0, 1, sum)) %>%
#or
#mutate(Richness = rowSums(.[4:(ncol(.)-1)] > 0)) %>%
select(Site, Time, Abundance, Richness)
# A tibble: 4 x 4
# Site Time Abundance Richness
# <chr> <chr> <dbl> <int>
#1 Home A 7 2
#2 Home B 7 2
#3 Work A 4 3
#4 Work B 4 3
也可以写成先group_by
sum
再transmute
df %>%
group_by(Site, Time) %>%
summarise_at(vars(matches("Species")), sum) %>%
ungroup %>%
transmute(Site, Time, Abundance = rowSums(.[3:ncol(.)]),
Richness = rowSums(.[3:ncol(.)] > 0))
或者另一种选择是 sum
和 map
df %>%
group_by(Site, Time) %>%
summarise_at(vars(matches("Species")), sum) %>%
group_by(Time, add = TRUE) %>%
nest %>%
mutate(data = map(data, ~
tibble(Richness = sum(.x > 0),
Abundance = sum(.x)))) %>%
unnest
数据
df <- structure(list(Site = c("Home", "Home", "Work", "Work", "Home",
"Home", "Work", "Work"), Time = c("A", "A", "A", "A", "B", "B",
"B", "B"), Sample = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), Species1 = c(1L,
0L, 0L, 1L, 1L, 0L, 0L, 1L), Species2 = c(0L, 0L, 1L, 0L, 0L,
0L, 1L, 0L), Species3 = c(4L, 2L, 1L, 1L, 4L, 2L, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-8L))
问题:
我有多个站点,每个站点有 10 个采样点。
Site Time Sample Species1 Species2 Species3 etc
Home A 1 1 0 4 ...
Home A 2 0 0 2 ...
Work A 1 0 1 1 ...
Work A 2 1 0 1 ...
Home B 1 1 0 4 ...
Home B 2 0 0 2 ...
Work B 1 0 1 1 ...
Work B 2 1 0 1 ...
...
我想获得每个站点的丰富性和丰富性。丰富度是一个地点的物种总数,丰度是一个地点所有物种的所有个体的总数,像这样:
Site Time Richness Abundance
Home A 2 7
Work A 3 4
Home B 2 7
Work B 3 4
我可以通过两个函数(如下)到达那里。但是,我希望两者都在一个 dplyr 函数中。范围 7:34
指的是我的物种矩阵(每行一个 site/sample,物种作为列)。
df1 <- df %>% mutate(Abundance = rowSums(.[,4:30])) %>%
group_by(Site,Time) %>%
summarise_all(sum)
df1$Richness <- apply(df1[,4:30]>0, 1, sum)
如果我尝试在一个函数中执行这两项操作,我会收到以下错误
df1 <- df %>% mutate(Abundance = rowSums(.[,4:30]) ) %>%
group_by(Site, Time) %>%
summarise_all(sum) %>%
mutate(Richness = apply(.[,4:30]>0, 1, sum))
Error in mutate_impl(.data, dots) :
Column `Richness` must be length 5 (the group size) or one, not 19
Richness 部分必须在 summarize 函数之后,因为它必须对汇总和分组数据进行操作。
如何使用此功能?
(注意:这之前被标记为这个问题的重复: Manipulating seperated species quantity data into a species abundance matrix
然而,这是一个完全不同的问题 - 该问题本质上是关于转置数据集并在单个 species/column 内求和。这是关于跨列(多列)对 all 物种求和。 此外,我实际上认为这个问题的答案非常有帮助 - 像我这样的生态学家一直在计算丰富度和丰度,我相信他们会喜欢一个专门的问题。)
summarise
之后,我们还需要ungroup
library(tidyverse)
df %>%
mutate(Abundance = rowSums(.[4:ncol(.)])) %>%
group_by(Site, Time) %>%
summarise_all(sum) %>%
ungroup %>%
mutate(Richness = apply(.[4:(ncol(.)-1)] > 0, 1, sum)) %>%
#or
#mutate(Richness = rowSums(.[4:(ncol(.)-1)] > 0)) %>%
select(Site, Time, Abundance, Richness)
# A tibble: 4 x 4
# Site Time Abundance Richness
# <chr> <chr> <dbl> <int>
#1 Home A 7 2
#2 Home B 7 2
#3 Work A 4 3
#4 Work B 4 3
也可以写成先group_by
sum
再transmute
df %>%
group_by(Site, Time) %>%
summarise_at(vars(matches("Species")), sum) %>%
ungroup %>%
transmute(Site, Time, Abundance = rowSums(.[3:ncol(.)]),
Richness = rowSums(.[3:ncol(.)] > 0))
或者另一种选择是 sum
和 map
df %>%
group_by(Site, Time) %>%
summarise_at(vars(matches("Species")), sum) %>%
group_by(Time, add = TRUE) %>%
nest %>%
mutate(data = map(data, ~
tibble(Richness = sum(.x > 0),
Abundance = sum(.x)))) %>%
unnest
数据
df <- structure(list(Site = c("Home", "Home", "Work", "Work", "Home",
"Home", "Work", "Work"), Time = c("A", "A", "A", "A", "B", "B",
"B", "B"), Sample = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), Species1 = c(1L,
0L, 0L, 1L, 1L, 0L, 0L, 1L), Species2 = c(0L, 0L, 1L, 0L, 0L,
0L, 1L, 0L), Species3 = c(4L, 2L, 1L, 1L, 4L, 2L, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-8L))