从两个不同的行总结

Question

这是我的起始df

test <- data.frame(year = c(2018,2018,2018,2018,2018), 
                    source = c("file1", "file1", "file1", "file1", "file1"),
                    area = c("000", "000", "800", "800", "800"),
                    cult2 = c("PBGEX", "QPGEX", "PBGEX", "QPGEX", "QPIND"), 
                    value = c(1000,2000,3000,4000,5000))

  year source area cult2 value
1 2018  file1  000 PBGEX  1000
2 2018  file1  000 QPGEX  2000
3 2018  file1  800 PBGEX  3000
4 2018  file1  800 QPGEX  4000
5 2018  file1  800 QPIND  5000

我需要为字段 PBGEX 和 QPGEX 获取每个 year/source/area 值的总和。我正在考虑使用 spread 和 gather 但我丢失了许多其他列（此处未显示）。

这是我想要的，除了：

  year source area cult2 value
1 2018  file1  000 PBGEX  1000
2 2018  file1  000 QPGEX  2000
3 2018  file1  800 PBGEX  3000
4 2018  file1  800 QPGEX  4000
5 2018  file1  800 QPIND  5000
6 2018  file1  000 RDGEX  3000
7 2018  file1  800 RDGEX  7000

Answer 1

我们可以 filter 'cult2' 为 'QPGEX'、'PBGEX' 的行，然后执行 group_by sum 和 bind_rows 与原始数据集

library(dplyr)
test %>%
    filter(cult2 %in% c("QPGEX", "PBGEX")) %>% 
    group_by(year, source, area) %>%
    summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
    bind_rows(test, .)

-输出

#   year source area cult2 value
#1 2018  file1  000 PBGEX  1000
#2 2018  file1  000 QPGEX  2000
#3 2018  file1  800 PBGEX  3000
#4 2018  file1  800 QPGEX  4000
#5 2018  file1  800 QPIND  5000
#6 2018  file1  000 RDGEX  3000
#7 2018  file1  800 RDGEX  7000

如果我们需要 proportion 列

test %>%
 filter(cult2 %in% c("QPGEX", "PBGEX")) %>% 
 group_by(year, source, area) %>%
 group_by(prop = value[cult2== "QPGEX"]/value[cult2 == "PBGEX"],
        .add = TRUE) %>% 
 summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>% 
 bind_rows(test, .)

或者也可以

library(tidyr)
test %>% 
   filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
   pivot_wider(names_from = cult2, values_from = value) %>% 
   # or use spread
   #spread(cult2, value) %>%
   mutate(prop = QPGEX/PBGEX) %>% 
   select(-PBGEX, -QPGEX) %>%
   right_join(test)

-输出

# A tibble: 5 x 6
#   year source area   prop cult2 value
#  <dbl> <chr>  <chr> <dbl> <chr> <dbl>
#1  2018 file1  000    2    PBGEX  1000
#2  2018 file1  000    2    QPGEX  2000
#3  2018 file1  800    1.33 PBGEX  3000
#4  2018 file1  800    1.33 QPGEX  4000
#5  2018 file1  800    1.33 QPIND  5000

从两个不同的行总结

summarize from two differents rows

r

dataframe

dplyr

summarize