如何合并 dplyr R 中两个不同列的行数据？

Question

site <- c(1,1,2,2,3,3,4,4)
rep <- c(1,2,1,2,1,2,1,2)
sp.1 <- c(NA,1,NA,4,NA,6,7,NA)
sp.2 <-  c(2,NA,1,NA,5,6,7,8)
df.dummy <- data.frame(site, rep, sp.1, sp.2)

  site rep sp.1 sp.2
1    1   1   NA    2
2    1   2    1   NA
3    2   1   NA    1
4    2   2    4   NA
5    3   1   NA    5
6    3   2    6    6
7    4   1    7    7
8    4   2   NA    8

在我的数据集中，我想做一些事情：在相同的站点中，但不同的代表在一行中具有 sp.1 的 NA，在另一行中具有 sp.2 的 NA，反之亦然（例如，此数据框中的前两行），然后合并该列。

所以这些行应该像

 site rep sp.1 sp.2
1    1   1    1    2
2    1   2    1   NA ---> get rid of this row
3    2   1   4    1
4    2   2    4   NA ---> get rid of this row
5    3   1   NA    5 
6    3   2    6    6
7    4   1    7    7
8    4   2   NA    8

如果，比方说，sp.2 有相同站点和代表的两个数据点（例如 5 和 6），则取这两个点的平均值并按照 (1)

最后，我只需要 4 行，但都填满了 sp.1 和 sp.2

 site rep sp.1 sp.2
    1   1    1    2
    2   1    4    1
    3   2    6    5.5
    4   1    7    7.5

编辑：我添加了结果

Answer 1

根据说明进行编辑：

library(tidyverse)
df.dummy %>%
  pivot_longer(sp.1:sp.2) %>%
  group_by(site, name) %>%
  summarize(value = mean(value, na.rm = TRUE), .groups = "drop") %>%
  pivot_wider(names_from = name, values_from = value)

或者更简单地说：

df.dummy %>%
  group_by(site) %>%
  summarize(across(sp.1:sp.2, ~mean(.x, na.rm = TRUE)))

# A tibble: 4 x 3
   site  sp.1  sp.2
  <dbl> <dbl> <dbl>
1     1     1   2  
2     2     4   1  
3     3     6   5.5
4     4     7   7.5

Answer 2

可能，我们需要

library(dplyr)
df.dummy %>% 
   group_by(site) %>% 
   mutate(across(starts_with('sp'), 
          ~.[order(is.na(.))])) %>% 
   filter(!if_all(starts_with('sp'),  is.na)) %>% 
   summarise(rep= first(rep), across(starts_with('sp'), 
         mean, na.rm = TRUE))

-输出

# A tibble: 4 × 4
   site   rep  sp.1  sp.2
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1   2  
2     2     1     4   1  
3     3     1     6   5.5
4     4     1     7   7.5

Answer 3

更新：保留 rep 列的较短版本：

df.dummy %>%
  group_by(site, first(rep)) %>%
  summarize(across(sp.1:sp.2, ~mean(.x, na.rm = TRUE)))

第一个回答：我们可以这样做：按 first(rep):

分组

library(dplyr)
df.dummy %>% 
  group_by(site, first(rep)) %>% 
  summarise(sp.1=mean(sp.1, na.rm = TRUE), sp.2=mean(sp.2, na.rm = TRUE))

# Groups:   site [4]
   site `first(rep)`  sp.1  sp.2
  <dbl>        <dbl> <dbl> <dbl>
1     1            1     1   2  
2     2            1     4   1  
3     3            1     6   5.5
4     4            1     7   7.5

如何合并 dplyr R 中两个不同列的行数据？

How to merge row data for two different columns in dplyr R?

merge

r

mean

dplyr