如何根据 R 中其他变量的值组合数据行?
How do I combine rows of data based on values of other variables in R?
我正在尝试根据其他变量的水平合并数据行我在下面附上了我的数据样本。
data <- structure(list(FishID = c("SSS012", "SSS012", "SSS012", "SSS014",
"SSS014", "SSS014", "SSS24", "SSS24", "SSS24", "SSS24", "SSS24"
), Taxa = c("Krill", "Onisimus", "Onisimus", "Krill", "Krill",
"Onisimus", "Copepods", "Onisimus", "Themisto", "Unidentified Fish",
"Unidentified Fish"), EstimatedNumber = c(2L, 6L, 1L, 2L, NA,
6L, 16L, 4L, 389L, 80L, 1L), TotalMass = c(0.074, 0.143, 0.052,
0.034, 5.342, 0.16, 0.09, 0.087, 28.742, 6.556, 0.782), Comments = c("",
"", "", "", "", "", "", "", "", "", "will likely change taxa to fish"
), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L,
2019L, 2019L, 2019L, 2019L), PA = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1)), row.names = c(487L, 488L, 489L, 512L, 513L, 514L, 628L,
634L, 636L, 638L, 639L), class = "data.frame")
如果我们运行
table(data$FishID, data$Taxa)
我们可以看到一些类群出现了两次,而另一些类群只出现了一次。我想确保每个分类单元每个 FishID 只出现一次。但是,我想保存两行的估计数量和总质量数据(即,对于 FishID SSS012,我想要 Onisimus 的一行,除了磷虾行之外,估计数量的值为 7,总质量的值为 0.095 ).
这是一个使用 dplyr
的潜在解决方案:
library(dplyr)
data %>%
group_by(FishID, Taxa) %>%
summarize(across(EstimatedNumber:TotalMass, ~sum(.)))
这给了我们:
FishID Taxa EstimatedNumber TotalMass
<chr> <chr> <int> <dbl>
1 SSS012 Krill 2 0.074
2 SSS012 Onisimus 7 0.195
3 SSS014 Krill NA 5.38
4 SSS014 Onisimus 6 0.16
5 SSS24 Copepods 16 0.09
6 SSS24 Onisimus 4 0.087
7 SSS24 Themisto 389 28.7
8 SSS24 Unidentified Fish 81 7.34
我正在尝试根据其他变量的水平合并数据行我在下面附上了我的数据样本。
data <- structure(list(FishID = c("SSS012", "SSS012", "SSS012", "SSS014",
"SSS014", "SSS014", "SSS24", "SSS24", "SSS24", "SSS24", "SSS24"
), Taxa = c("Krill", "Onisimus", "Onisimus", "Krill", "Krill",
"Onisimus", "Copepods", "Onisimus", "Themisto", "Unidentified Fish",
"Unidentified Fish"), EstimatedNumber = c(2L, 6L, 1L, 2L, NA,
6L, 16L, 4L, 389L, 80L, 1L), TotalMass = c(0.074, 0.143, 0.052,
0.034, 5.342, 0.16, 0.09, 0.087, 28.742, 6.556, 0.782), Comments = c("",
"", "", "", "", "", "", "", "", "", "will likely change taxa to fish"
), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L,
2019L, 2019L, 2019L, 2019L), PA = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1)), row.names = c(487L, 488L, 489L, 512L, 513L, 514L, 628L,
634L, 636L, 638L, 639L), class = "data.frame")
如果我们运行
table(data$FishID, data$Taxa)
我们可以看到一些类群出现了两次,而另一些类群只出现了一次。我想确保每个分类单元每个 FishID 只出现一次。但是,我想保存两行的估计数量和总质量数据(即,对于 FishID SSS012,我想要 Onisimus 的一行,除了磷虾行之外,估计数量的值为 7,总质量的值为 0.095 ).
这是一个使用 dplyr
的潜在解决方案:
library(dplyr)
data %>%
group_by(FishID, Taxa) %>%
summarize(across(EstimatedNumber:TotalMass, ~sum(.)))
这给了我们:
FishID Taxa EstimatedNumber TotalMass
<chr> <chr> <int> <dbl>
1 SSS012 Krill 2 0.074
2 SSS012 Onisimus 7 0.195
3 SSS014 Krill NA 5.38
4 SSS014 Onisimus 6 0.16
5 SSS24 Copepods 16 0.09
6 SSS24 Onisimus 4 0.087
7 SSS24 Themisto 389 28.7
8 SSS24 Unidentified Fish 81 7.34