根据 R 中的第二列创建新的行值

Question

我想创建一个名为“X”的新变量，它是“B”和“D”的总和

type <- c( "A", "B","C","D","E")
cnt <- c(2,5,3,7,8)

df <- data.frame(type,cnt)

> df
  type cnt
1    A   2
2    B   5
3    C   3
4    D   7
5    E   8

期望的输出是

如果我们添加另一个分组变量（如日期），如何扩展它。想每天加起来X

 date <- c("2022-01-01","2022-01-01","2022-01-01","2022-01-01","2022-01-01","2022-01-02","2022-01-02","2022-01-02","2022-01-02","2022-01-02")
type <- c("A", "B","C","D","E","A", "B","C","D","E")
cnt <- c(2,5,3,7,8, 1,9,8,2,5)

df <- data.frame(date,type,cnt)

df
         date type cnt
1  2022-01-01    A   2
2  2022-01-01    B   5
3  2022-01-01    C   3
4  2022-01-01    D   7
5  2022-01-01    E   8
6  2022-01-02    A   1
7  2022-01-02    B   9
8  2022-01-02    C   8
9  2022-01-02    D   2
10 2022-01-02    E   5

期望的输出是

df
         date type cnt
1  2022-01-01    A   2
2  2022-01-01    B   5
3  2022-01-01    C   3
4  2022-01-01    D   7
5  2022-01-01    E   8
6  2022-01-01    X  12
7  2022-01-02    A   1
8  2022-01-02    B   9
9  2022-01-02    C   8
10 2022-01-02    D   2
11 2022-01-02    E   5
12 2022-01-02    X   11

Answer 1

我们可以子集和 rbind

rbind(df, data.frame(type = "X", cnt = sum(df$cnt[df$type %in% c("B", "D")])))

-输出

或在 dplyr 中，filter 基于 'type' 值的行，summarise 通过取 'cnt' 的 sum，在将 'type' 创建为 'X' 并将 bind_rows 与原始数据集

一起使用时

library(dplyr)
df %>% 
  filter(type %in% c("B", "D")) %>% 
  summarise(type = 'X', cnt = sum(cnt)) %>%
  bind_rows(df, .)

或不使用 bind_rows

df %>% 
   summarise(type = c(type, 'X'), cnt = c(cnt, sum(cnt[type %in% c("B", "D")])))
  type cnt
1    A   2
2    B   5
3    C   3
4    D   7
5    E   8
6    X  12

或使用complete

library(tidyr)
complete(df, type = c(type, "X"), fill = list(cnt = sum(cnt[type %in% c("B", "D")])))
# A tibble: 6 × 2
  type    cnt
  <chr> <dbl>
1 A         2
2 B         5
3 C         3
4 D         7
5 E         8
6 X        12

更新

对于更新后的数据，只需添加一个group_by

df %>% 
  group_by(date) %>%
  summarise(type = c(type, "X"), 
    cnt = c(cnt, sum(cnt[type %in% c("B", "D")])), .groups = 'drop')

-输出

# A tibble: 12 × 3
   date       type    cnt
   <chr>      <chr> <dbl>
 1 2022-01-01 A         2
 2 2022-01-01 B         5
 3 2022-01-01 C         3
 4 2022-01-01 D         7
 5 2022-01-01 E         8
 6 2022-01-01 X        12
 7 2022-01-02 A         1
 8 2022-01-02 B         9
 9 2022-01-02 C         8
10 2022-01-02 D         2
11 2022-01-02 E         5
12 2022-01-02 X        11

或使用filter方法

df %>%
   filter(type %in% c("B", "D")) %>% 
   group_by(date) %>% 
   summarise(type = 'X', cnt = sum(cnt), .groups = 'drop') %>% 
   bind_rows(df, .) %>% 
   arrange(date)

Answer 2

您还可以使用：

df %>%
  add_row(type= 'X', cnt = sum(.$cnt[.$type %in% c('B', 'D')]))

  type cnt
1    A   2
2    B   5
3    C   3
4    D   7
5    E   8
6    X  12

更新：

df %>%
   group_by(date)%>%
   group_modify(~add_row(.,type = 'X', 
                           cnt = sum(.$cnt[.$type%in%c('B', 'D')])))
# A tibble: 12 x 3
# Groups:   date [2]
   date       type    cnt
   <chr>      <chr> <int>
 1 2022-01-01 A         2
 2 2022-01-01 B         5
 3 2022-01-01 C         3
 4 2022-01-01 D         7
 5 2022-01-01 E         8
 6 2022-01-01 X        12
 7 2022-01-02 A         1
 8 2022-01-02 B         9
 9 2022-01-02 C         8
10 2022-01-02 D         2
11 2022-01-02 E         5
12 2022-01-02 X        11

Answer 3

另一种可能的解决方案，在 base R 中：

rbind(df, c(type = "X", sum(ifelse(type %in% c("B", "D"), cnt, 0))))     

#>   type cnt
#> 1    A   2
#> 2    B   5
#> 3    C   3
#> 4    D   7
#> 5    E   8
#> 6    X  12

与dplyr:

bind_rows(df, list(type = "X", cnt = sum(if_else(type %in% c("B","D"), cnt, 0))))

Answer 4

这里有一个 dplyr 与 janitor 组合的替代方案：

df %>% 
  filter(type == "B" |type == "D") %>% 
  adorn_totals(name="X") %>% 
  filter(type == "X") %>% 
  bind_rows(df) %>% 
  arrange(cnt)

根据 R 中的第二列创建新的行值

Creating new row values based on second column in R

row

r

更新

更新：