如何根据 R 中其他变量的值组合数据行？

Question

我正在尝试根据其他变量的水平合并数据行我在下面附上了我的数据样本。

data <- structure(list(FishID = c("SSS012", "SSS012", "SSS012", "SSS014", 
"SSS014", "SSS014", "SSS24", "SSS24", "SSS24", "SSS24", "SSS24"
), Taxa = c("Krill", "Onisimus", "Onisimus", "Krill", "Krill", 
"Onisimus", "Copepods", "Onisimus", "Themisto", "Unidentified Fish", 
"Unidentified Fish"), EstimatedNumber = c(2L, 6L, 1L, 2L, NA, 
6L, 16L, 4L, 389L, 80L, 1L), TotalMass = c(0.074, 0.143, 0.052, 
0.034, 5.342, 0.16, 0.09, 0.087, 28.742, 6.556, 0.782), Comments = c("", 
"", "", "", "", "", "", "", "", "", "will likely change taxa to fish"
), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 
2019L, 2019L, 2019L, 2019L), PA = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1)), row.names = c(487L, 488L, 489L, 512L, 513L, 514L, 628L, 
634L, 636L, 638L, 639L), class = "data.frame")

如果我们运行 table(data$FishID, data$Taxa) 我们可以看到一些类群出现了两次，而另一些类群只出现了一次。我想确保每个分类单元每个 FishID 只出现一次。但是，我想保存两行的估计数量和总质量数据（即，对于 FishID SSS012，我想要 Onisimus 的一行，除了磷虾行之外，估计数量的值为 7，总质量的值为 0.095 ).

Answer 1

这是一个使用 dplyr 的潜在解决方案：

library(dplyr)

data %>% 
  group_by(FishID, Taxa) %>% 
  summarize(across(EstimatedNumber:TotalMass, ~sum(.)))

这给了我们：

  FishID Taxa              EstimatedNumber TotalMass
  <chr>  <chr>                       <int>     <dbl>
1 SSS012 Krill                           2     0.074
2 SSS012 Onisimus                        7     0.195
3 SSS014 Krill                          NA     5.38 
4 SSS014 Onisimus                        6     0.16 
5 SSS24  Copepods                       16     0.09 
6 SSS24  Onisimus                        4     0.087
7 SSS24  Themisto                      389    28.7  
8 SSS24  Unidentified Fish              81     7.34

如何根据 R 中其他变量的值组合数据行？

How do I combine rows of data based on values of other variables in R?

r

data-manipulation