如何通过对每个站点的观测值求和来添加一列丰度数据?
How do I add a column of abundance data by summing observations per site?
我有一个数据框,其中包含跨多个站点对扇贝 presence/absence 的观察。我想使用 UID(唯一标识符)和 presence/absence 列(二进制:0 不存在,1 存在)来计算每个站点的扇贝数。
我的数据框如下所示:
UID
Present.Absent
Size.cm
binary
A-10-2021
Present
4.60
1
A-10-2021
Present
6.0
1
A-11-2021
Present
4.70
1
A-11-2021
Present
4.8
1
A-4-2021
Absent
NA
0
A-5-2021
Present
5.90
1
A-5-2021
Present
6.00
1
A-5-2021
Present
6.00
1
A-5-2021
Present
3.90
1
A-5-2021
Present
5.00
1
A-6-2021
Absent
NA
0
它继续进行大约 6000 次观察,大约有 1500 个不同的 UID
我是 R 的新手,不知道该怎么做。有没有办法让每个 UID 一行,有一列丰度数据?非常感谢任何帮助,如果有任何其他信息有帮助,我很乐意提供。谢谢!
编辑:添加了数据样本;前 10 行
structure(list(UID = c("A-10-2021", "A-10-2021", "A-11-2021",
"A-11-2021", "A-1-2021", "A-1-2021", "A-1-2021", "A-12-2021",
"A-12-2021", "A-12-2021"), Present.Absent = c("Present", "Present",
"Present", "Present", "Present", "Present", "Present", "Present",
"Present", "Present"), Alive.Dead = c("Alive", "Alive", "Alive",
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive"
), Size.cm = c(4.6, 5.25, 4.7, 5.1, 3.5, 3.9, 4.7, 4.7, 4.9,
4.9), binary = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(3L,
4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L), class = "data.frame")
您可以使用 group_by() 来实现:
# Your data
temp1 <- structure(list(UID = c("A-10-2021", "A-10-2021", "A-11-2021",
"A-11-2021", "A-1-2021", "A-1-2021", "A-1-2021", "A-12-2021",
"A-12-2021", "A-12-2021"), Present.Absent = c("Present", "Present",
"Present", "Present", "Present", "Present", "Present", "Present",
"Present", "Present"), Alive.Dead = c("Alive", "Alive", "Alive",
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive"
), Size.cm = c(4.6, 5.25, 4.7, 5.1, 3.5, 3.9, 4.7, 4.7, 4.9,
4.9), id = c(3L, 4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L)), row.names = c(3L,
4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L), class = "data.frame")
请注意,您可以先使用 mutate() 和 ifelse() 创建二进制列 (isPresent)。
library(tidyverse)
# Option 1: Create a new column with abundance, by UID, but keep the number of rows
temp1 %>% mutate(isPresent = ifelse(Present.Absent == "Present", 1, 0)) %>% group_by(UID) %>% mutate(abundance = sum(isPresent))
# Option 2: Get a summary, with one row per UID
temp1 %>% mutate(isPresent = ifelse(Present.Absent == "Present", 1, 0)) %>% group_by(UID) %>% summarise(abundance = sum(isPresent))
我有一个数据框,其中包含跨多个站点对扇贝 presence/absence 的观察。我想使用 UID(唯一标识符)和 presence/absence 列(二进制:0 不存在,1 存在)来计算每个站点的扇贝数。
我的数据框如下所示:
UID | Present.Absent | Size.cm | binary |
---|---|---|---|
A-10-2021 | Present | 4.60 | 1 |
A-10-2021 | Present | 6.0 | 1 |
A-11-2021 | Present | 4.70 | 1 |
A-11-2021 | Present | 4.8 | 1 |
A-4-2021 | Absent | NA | 0 |
A-5-2021 | Present | 5.90 | 1 |
A-5-2021 | Present | 6.00 | 1 |
A-5-2021 | Present | 6.00 | 1 |
A-5-2021 | Present | 3.90 | 1 |
A-5-2021 | Present | 5.00 | 1 |
A-6-2021 | Absent | NA | 0 |
它继续进行大约 6000 次观察,大约有 1500 个不同的 UID
我是 R 的新手,不知道该怎么做。有没有办法让每个 UID 一行,有一列丰度数据?非常感谢任何帮助,如果有任何其他信息有帮助,我很乐意提供。谢谢!
编辑:添加了数据样本;前 10 行
structure(list(UID = c("A-10-2021", "A-10-2021", "A-11-2021",
"A-11-2021", "A-1-2021", "A-1-2021", "A-1-2021", "A-12-2021",
"A-12-2021", "A-12-2021"), Present.Absent = c("Present", "Present",
"Present", "Present", "Present", "Present", "Present", "Present",
"Present", "Present"), Alive.Dead = c("Alive", "Alive", "Alive",
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive"
), Size.cm = c(4.6, 5.25, 4.7, 5.1, 3.5, 3.9, 4.7, 4.7, 4.9,
4.9), binary = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(3L,
4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L), class = "data.frame")
您可以使用 group_by() 来实现:
# Your data
temp1 <- structure(list(UID = c("A-10-2021", "A-10-2021", "A-11-2021",
"A-11-2021", "A-1-2021", "A-1-2021", "A-1-2021", "A-12-2021",
"A-12-2021", "A-12-2021"), Present.Absent = c("Present", "Present",
"Present", "Present", "Present", "Present", "Present", "Present",
"Present", "Present"), Alive.Dead = c("Alive", "Alive", "Alive",
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive"
), Size.cm = c(4.6, 5.25, 4.7, 5.1, 3.5, 3.9, 4.7, 4.7, 4.9,
4.9), id = c(3L, 4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L)), row.names = c(3L,
4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L), class = "data.frame")
请注意,您可以先使用 mutate() 和 ifelse() 创建二进制列 (isPresent)。
library(tidyverse)
# Option 1: Create a new column with abundance, by UID, but keep the number of rows
temp1 %>% mutate(isPresent = ifelse(Present.Absent == "Present", 1, 0)) %>% group_by(UID) %>% mutate(abundance = sum(isPresent))
# Option 2: Get a summary, with one row per UID
temp1 %>% mutate(isPresent = ifelse(Present.Absent == "Present", 1, 0)) %>% group_by(UID) %>% summarise(abundance = sum(isPresent))