从两个不同的行总结
summarize from two differents rows
这是我的起始df
test <- data.frame(year = c(2018,2018,2018,2018,2018),
source = c("file1", "file1", "file1", "file1", "file1"),
area = c("000", "000", "800", "800", "800"),
cult2 = c("PBGEX", "QPGEX", "PBGEX", "QPGEX", "QPIND"),
value = c(1000,2000,3000,4000,5000))
year source area cult2 value
1 2018 file1 000 PBGEX 1000
2 2018 file1 000 QPGEX 2000
3 2018 file1 800 PBGEX 3000
4 2018 file1 800 QPGEX 4000
5 2018 file1 800 QPIND 5000
我需要为字段 PBGEX 和 QPGEX 获取每个 year/source/area 值的总和。
我正在考虑使用 spread
和 gather
但我丢失了许多其他列(此处未显示)。
这是我想要的,除了:
year source area cult2 value
1 2018 file1 000 PBGEX 1000
2 2018 file1 000 QPGEX 2000
3 2018 file1 800 PBGEX 3000
4 2018 file1 800 QPGEX 4000
5 2018 file1 800 QPIND 5000
6 2018 file1 000 RDGEX 3000
7 2018 file1 800 RDGEX 7000
我们可以 filter
'cult2' 为 'QPGEX'、'PBGEX' 的行,然后执行 group_by
sum
和 bind_rows
与原始数据集
library(dplyr)
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
group_by(year, source, area) %>%
summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
bind_rows(test, .)
-输出
# year source area cult2 value
#1 2018 file1 000 PBGEX 1000
#2 2018 file1 000 QPGEX 2000
#3 2018 file1 800 PBGEX 3000
#4 2018 file1 800 QPGEX 4000
#5 2018 file1 800 QPIND 5000
#6 2018 file1 000 RDGEX 3000
#7 2018 file1 800 RDGEX 7000
如果我们需要 prop
ortion 列
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
group_by(year, source, area) %>%
group_by(prop = value[cult2== "QPGEX"]/value[cult2 == "PBGEX"],
.add = TRUE) %>%
summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
bind_rows(test, .)
或者也可以
library(tidyr)
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
pivot_wider(names_from = cult2, values_from = value) %>%
# or use spread
#spread(cult2, value) %>%
mutate(prop = QPGEX/PBGEX) %>%
select(-PBGEX, -QPGEX) %>%
right_join(test)
-输出
# A tibble: 5 x 6
# year source area prop cult2 value
# <dbl> <chr> <chr> <dbl> <chr> <dbl>
#1 2018 file1 000 2 PBGEX 1000
#2 2018 file1 000 2 QPGEX 2000
#3 2018 file1 800 1.33 PBGEX 3000
#4 2018 file1 800 1.33 QPGEX 4000
#5 2018 file1 800 1.33 QPIND 5000
这是我的起始df
test <- data.frame(year = c(2018,2018,2018,2018,2018),
source = c("file1", "file1", "file1", "file1", "file1"),
area = c("000", "000", "800", "800", "800"),
cult2 = c("PBGEX", "QPGEX", "PBGEX", "QPGEX", "QPIND"),
value = c(1000,2000,3000,4000,5000))
year source area cult2 value
1 2018 file1 000 PBGEX 1000
2 2018 file1 000 QPGEX 2000
3 2018 file1 800 PBGEX 3000
4 2018 file1 800 QPGEX 4000
5 2018 file1 800 QPIND 5000
我需要为字段 PBGEX 和 QPGEX 获取每个 year/source/area 值的总和。
我正在考虑使用 spread
和 gather
但我丢失了许多其他列(此处未显示)。
这是我想要的,除了:
year source area cult2 value
1 2018 file1 000 PBGEX 1000
2 2018 file1 000 QPGEX 2000
3 2018 file1 800 PBGEX 3000
4 2018 file1 800 QPGEX 4000
5 2018 file1 800 QPIND 5000
6 2018 file1 000 RDGEX 3000
7 2018 file1 800 RDGEX 7000
我们可以 filter
'cult2' 为 'QPGEX'、'PBGEX' 的行,然后执行 group_by
sum
和 bind_rows
与原始数据集
library(dplyr)
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
group_by(year, source, area) %>%
summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
bind_rows(test, .)
-输出
# year source area cult2 value
#1 2018 file1 000 PBGEX 1000
#2 2018 file1 000 QPGEX 2000
#3 2018 file1 800 PBGEX 3000
#4 2018 file1 800 QPGEX 4000
#5 2018 file1 800 QPIND 5000
#6 2018 file1 000 RDGEX 3000
#7 2018 file1 800 RDGEX 7000
如果我们需要 prop
ortion 列
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
group_by(year, source, area) %>%
group_by(prop = value[cult2== "QPGEX"]/value[cult2 == "PBGEX"],
.add = TRUE) %>%
summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
bind_rows(test, .)
或者也可以
library(tidyr)
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
pivot_wider(names_from = cult2, values_from = value) %>%
# or use spread
#spread(cult2, value) %>%
mutate(prop = QPGEX/PBGEX) %>%
select(-PBGEX, -QPGEX) %>%
right_join(test)
-输出
# A tibble: 5 x 6
# year source area prop cult2 value
# <dbl> <chr> <chr> <dbl> <chr> <dbl>
#1 2018 file1 000 2 PBGEX 1000
#2 2018 file1 000 2 QPGEX 2000
#3 2018 file1 800 1.33 PBGEX 3000
#4 2018 file1 800 1.33 QPGEX 4000
#5 2018 file1 800 1.33 QPIND 5000