dplyr & tibble - 基于列值的两行条件总和

Question

鉴于如下所示的小标题，我正在尝试使用 Tidyverse 根据两个字段中每个字段中 Item 的值执行条件求和。具体来说，对于 foo 和 bar，我想将项目 a 对应的值与项目 b 对应的值相加，然后删除前者的行。 table 2.

中举例说明了我正在寻找的结果

data <- tibble(Field = rep(c("foo", "bar"), each=4),
               Item = rep(c("a", "b", "c", "d"), 2),
               Value = runif(8))

# table 1                          # table 2
| Field | Item |   Value   |       | Field | Item |   Value   |
|-------|------|-----------|       |-------|------|-----------|
|  foo  |  a   | 0.8167347 |       |  foo  |  b   | 0.9583989 | <== 0.8167347 + 0.1416642
|  foo  |  b   | 0.1416642 |       |  foo  |  c   | 0.7054814 |
|  foo  |  c   | 0.7054814 |       |  foo  |  d   | 0.1196948 |
|  foo  |  d   | 0.1196948 |       |--------------------------|
|--------------------------|       |  bar  |  b   | 0.6177568 | <== 0.3604500 + 0.2573068
|  bar  |  a   | 0.3604500 |       |  bar  |  c   | 0.7003040 |
|  bar  |  b   | 0.2573068 |       |  bar  |  d   | 0.8131556 |
|  bar  |  c   | 0.7003040 |
|  bar  |  d   | 0.8131556 |

到目前为止，我没有设法接近预期的结果。我知道如何使用 dplyr 的分组功能来隔离属于两个字段之一的项目，但我不知道如何 select a 的值并将其求和到b分组后

Answer 1

使用 case_when 将 'Item' 列中的 'a' 替换为 'b'，同时使用 'Field' 和 'item' 作为分组列并得到summarise

中“值”的 sum

library(dplyr)
data %>%
    group_by(Field, Item = case_when(Item == 'a' ~ 'b', TRUE ~ Item)) %>% 
    summarise(Value = sum(Value, na.rm = TRUE), .groups= 'drop')

-输出

# A tibble: 6 × 3
  Field Item  Value
  <chr> <chr> <dbl>
1 bar   b     0.618
2 bar   c     0.700
3 bar   d     0.813
4 foo   b     0.958
5 foo   c     0.705
6 foo   d     0.120

数据

data <- structure(list(Field = c("foo", "foo", "foo", "foo", "bar", "bar", 
"bar", "bar"), Item = c("a", "b", "c", "d", "a", "b", "c", "d"
), Value = c(0.8167347, 0.1416642, 0.7054814, 0.1196948, 0.36045, 
0.2573068, 0.700304, 0.8131556)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

Answer 2

您可以更改 Item，当它等于 a 时接收 b，然后汇总

library(dplyr)

数据

set.seed(123)
data <- tibble(Field = rep(c("foo", "bar"), each=4),
               Item = rep(c("a", "b", "c", "d"), 2),
               Value = runif(8))

# A tibble: 8 x 3
Field Item   Value
<chr> <chr>  <dbl>
1 foo   a     0.288 
2 foo   b     0.788 
3 foo   c     0.409 
4 foo   d     0.883 
5 bar   a     0.940 
6 bar   b     0.0456
7 bar   c     0.528 
8 bar   d     0.892

结果

data %>% 
  mutate(Item = if_else(Item == "a","b",Item)) %>% 
  group_by(Field,Item) %>% 
  summarise(Value = sum(Value,na.rm = TRUE)) %>% 
  ungroup()

# A tibble: 6 x 3
Field Item  Value
<chr> <chr> <dbl>
1 bar   b     0.986
2 bar   c     0.528
3 bar   d     0.892
4 foo   b     1.08 
5 foo   c     0.409
6 foo   d     0.883

dplyr & tibble - 基于列值的两行条件总和

dplyr & tibble - conditional sum of two rows based on column value

r

dataframe

dplyr

tidyverse

tibble

数据

数据

结果