使用 R 聚合和汇总字符对象
Aggregate and summarise character object with R
我有一个育种生产力数据集:
df1
# Nest.box Obs.type individual.number Clutch Chick.status
# 1 Nest1 Egg 1 First NA
# 2 Nest1 Egg 2 First NA
# 3 Nest1 Egg 3 First NA
# 4 Nest2 Egg 1 First NA
# 5 Nest2 Egg 2 First NA
# 6 Nest2 Egg 1 First NA
# 7 Nest1 Chick 1 First Dead
# 8 Nest1 Chick 2 First Fledged
# 9 Nest2 Chick 1 First Fledged
# 10 Nest2 Chick 2 First Fledged
# 11 Nest2 Chick 1 Second Fledged
# 12 Nest2 Chick 2 Second UNK
我想通过 Nest.box 和 Clutch 汇总来总结这些数据(显示“Fledged”的数量nest.box, 离合)
想要的输出是这样的:
output
# Nest.box Clutch Fledged
# 1 Nest1 First 1
# 2 Nest2 First 2
# 3 Nest2 Second 1
library(dplyr)
df2 <- df1 %>%
distinct() %>%
group_by(Next.box, Clutch) %>%
tally() %>%
ungroup()
这是一个可能的解决方案:
library(dplyr)
df <- read.table(text = "Nest.box Obs.type individual.number Clutch Chick.status
1 Nest1 Egg 1 First NA
2 Nest1 Egg 2 First NA
3 Nest1 Egg 3 First NA
4 Nest2 Egg 1 First NA
5 Nest2 Egg 2 First NA
6 Nest2 Egg 1 First NA
7 Nest1 Chick 1 First Dead
8 Nest1 Chick 2 First Fledged
9 Nest2 Chick 1 First Fledged
10 Nest2 Chick 2 First Fledged
11 Nest2 Chick 1 Second Fledged
12 Nest2 Chick 2 Second UNK", header = TRUE)
df %>%
group_by(Nest.box, Clutch) %>%
summarise(Fledged = sum(Chick.status == "Fledged", na.rm = TRUE))
#> # A tibble: 3 × 3
#> # Groups: Nest.box [2]
#> Nest.box Clutch Fledged
#> <chr> <chr> <int>
#> 1 Nest1 First 1
#> 2 Nest2 First 2
#> 3 Nest2 Second 1
由 reprex package (v2.0.1)
于 2022-04-04 创建
另一种选择是 filter
然后使用 count
:
library(tidyverse)
df %>%
filter(Chick.status == "Fledged") %>%
count(Nest.box, Clutch)
输出
Nest.box Clutch n
1 Nest1 First 1
2 Nest2 First 2
3 Nest2 Second 1
数据
df <- structure(list(Nest.box = c("Nest1", "Nest1", "Nest1", "Nest2",
"Nest2", "Nest2", "Nest1", "Nest1", "Nest2", "Nest2", "Nest2",
"Nest2"), Obs.type = c("Egg", "Egg", "Egg", "Egg", "Egg", "Egg",
"Chick", "Chick", "Chick", "Chick", "Chick", "Chick"), individual.number = c(1L,
2L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L), Clutch = c("First",
"First", "First", "First", "First", "First", "First", "First",
"First", "First", "Second", "Second"), Chick.status = c(NA, NA,
NA, NA, NA, NA, "Dead", "Fledged", "Fledged", "Fledged", "Fledged",
"UNK")), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12"))
使用data.table:
library(data.table)
setDT(df)[Chick.status=="Fledged", .N, by=.(Nest.box, Clutch)]
## Nest.box Clutch N
## 1: Nest1 First 1
## 2: Nest2 First 2
## 3: Nest2 Second 1
这会将 df
转换为 data.table (setDT(df)
),过滤 Chick.status=='Fledged'
,并计算 (.N
) 按 [= 分组的行15=] 和 Clutch
.
我有一个育种生产力数据集:
df1
# Nest.box Obs.type individual.number Clutch Chick.status
# 1 Nest1 Egg 1 First NA
# 2 Nest1 Egg 2 First NA
# 3 Nest1 Egg 3 First NA
# 4 Nest2 Egg 1 First NA
# 5 Nest2 Egg 2 First NA
# 6 Nest2 Egg 1 First NA
# 7 Nest1 Chick 1 First Dead
# 8 Nest1 Chick 2 First Fledged
# 9 Nest2 Chick 1 First Fledged
# 10 Nest2 Chick 2 First Fledged
# 11 Nest2 Chick 1 Second Fledged
# 12 Nest2 Chick 2 Second UNK
我想通过 Nest.box 和 Clutch 汇总来总结这些数据(显示“Fledged”的数量nest.box, 离合)
想要的输出是这样的:
output
# Nest.box Clutch Fledged
# 1 Nest1 First 1
# 2 Nest2 First 2
# 3 Nest2 Second 1
library(dplyr)
df2 <- df1 %>%
distinct() %>%
group_by(Next.box, Clutch) %>%
tally() %>%
ungroup()
这是一个可能的解决方案:
library(dplyr)
df <- read.table(text = "Nest.box Obs.type individual.number Clutch Chick.status
1 Nest1 Egg 1 First NA
2 Nest1 Egg 2 First NA
3 Nest1 Egg 3 First NA
4 Nest2 Egg 1 First NA
5 Nest2 Egg 2 First NA
6 Nest2 Egg 1 First NA
7 Nest1 Chick 1 First Dead
8 Nest1 Chick 2 First Fledged
9 Nest2 Chick 1 First Fledged
10 Nest2 Chick 2 First Fledged
11 Nest2 Chick 1 Second Fledged
12 Nest2 Chick 2 Second UNK", header = TRUE)
df %>%
group_by(Nest.box, Clutch) %>%
summarise(Fledged = sum(Chick.status == "Fledged", na.rm = TRUE))
#> # A tibble: 3 × 3
#> # Groups: Nest.box [2]
#> Nest.box Clutch Fledged
#> <chr> <chr> <int>
#> 1 Nest1 First 1
#> 2 Nest2 First 2
#> 3 Nest2 Second 1
由 reprex package (v2.0.1)
于 2022-04-04 创建另一种选择是 filter
然后使用 count
:
library(tidyverse)
df %>%
filter(Chick.status == "Fledged") %>%
count(Nest.box, Clutch)
输出
Nest.box Clutch n
1 Nest1 First 1
2 Nest2 First 2
3 Nest2 Second 1
数据
df <- structure(list(Nest.box = c("Nest1", "Nest1", "Nest1", "Nest2",
"Nest2", "Nest2", "Nest1", "Nest1", "Nest2", "Nest2", "Nest2",
"Nest2"), Obs.type = c("Egg", "Egg", "Egg", "Egg", "Egg", "Egg",
"Chick", "Chick", "Chick", "Chick", "Chick", "Chick"), individual.number = c(1L,
2L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L), Clutch = c("First",
"First", "First", "First", "First", "First", "First", "First",
"First", "First", "Second", "Second"), Chick.status = c(NA, NA,
NA, NA, NA, NA, "Dead", "Fledged", "Fledged", "Fledged", "Fledged",
"UNK")), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12"))
使用data.table:
library(data.table)
setDT(df)[Chick.status=="Fledged", .N, by=.(Nest.box, Clutch)]
## Nest.box Clutch N
## 1: Nest1 First 1
## 2: Nest2 First 2
## 3: Nest2 Second 1
这会将 df
转换为 data.table (setDT(df)
),过滤 Chick.status=='Fledged'
,并计算 (.N
) 按 [= 分组的行15=] 和 Clutch
.