根据 R 中的第二列创建新的行值
Creating new row values based on second column in R
我想创建一个名为“X”的新变量,它是“B”和“D”的总和
type <- c( "A", "B","C","D","E")
cnt <- c(2,5,3,7,8)
df <- data.frame(type,cnt)
> df
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
期望的输出是
> df
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
如果我们添加另一个分组变量(如日期),如何扩展它。
想每天加起来X
date <- c("2022-01-01","2022-01-01","2022-01-01","2022-01-01","2022-01-01","2022-01-02","2022-01-02","2022-01-02","2022-01-02","2022-01-02")
type <- c("A", "B","C","D","E","A", "B","C","D","E")
cnt <- c(2,5,3,7,8, 1,9,8,2,5)
df <- data.frame(date,type,cnt)
df
date type cnt
1 2022-01-01 A 2
2 2022-01-01 B 5
3 2022-01-01 C 3
4 2022-01-01 D 7
5 2022-01-01 E 8
6 2022-01-02 A 1
7 2022-01-02 B 9
8 2022-01-02 C 8
9 2022-01-02 D 2
10 2022-01-02 E 5
期望的输出是
df
date type cnt
1 2022-01-01 A 2
2 2022-01-01 B 5
3 2022-01-01 C 3
4 2022-01-01 D 7
5 2022-01-01 E 8
6 2022-01-01 X 12
7 2022-01-02 A 1
8 2022-01-02 B 9
9 2022-01-02 C 8
10 2022-01-02 D 2
11 2022-01-02 E 5
12 2022-01-02 X 11
我们可以子集和 rbind
rbind(df, data.frame(type = "X", cnt = sum(df$cnt[df$type %in% c("B", "D")])))
-输出
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
或在 dplyr
中,filter
基于 'type' 值的行,summarise
通过取 'cnt' 的 sum
,在将 'type' 创建为 'X' 并将 bind_rows
与原始数据集
一起使用时
library(dplyr)
df %>%
filter(type %in% c("B", "D")) %>%
summarise(type = 'X', cnt = sum(cnt)) %>%
bind_rows(df, .)
或不使用 bind_rows
df %>%
summarise(type = c(type, 'X'), cnt = c(cnt, sum(cnt[type %in% c("B", "D")])))
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
或使用complete
library(tidyr)
complete(df, type = c(type, "X"), fill = list(cnt = sum(cnt[type %in% c("B", "D")])))
# A tibble: 6 × 2
type cnt
<chr> <dbl>
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
更新
对于更新后的数据,只需添加一个group_by
df %>%
group_by(date) %>%
summarise(type = c(type, "X"),
cnt = c(cnt, sum(cnt[type %in% c("B", "D")])), .groups = 'drop')
-输出
# A tibble: 12 × 3
date type cnt
<chr> <chr> <dbl>
1 2022-01-01 A 2
2 2022-01-01 B 5
3 2022-01-01 C 3
4 2022-01-01 D 7
5 2022-01-01 E 8
6 2022-01-01 X 12
7 2022-01-02 A 1
8 2022-01-02 B 9
9 2022-01-02 C 8
10 2022-01-02 D 2
11 2022-01-02 E 5
12 2022-01-02 X 11
或使用filter
方法
df %>%
filter(type %in% c("B", "D")) %>%
group_by(date) %>%
summarise(type = 'X', cnt = sum(cnt), .groups = 'drop') %>%
bind_rows(df, .) %>%
arrange(date)
您还可以使用:
df %>%
add_row(type= 'X', cnt = sum(.$cnt[.$type %in% c('B', 'D')]))
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
更新:
df %>%
group_by(date)%>%
group_modify(~add_row(.,type = 'X',
cnt = sum(.$cnt[.$type%in%c('B', 'D')])))
# A tibble: 12 x 3
# Groups: date [2]
date type cnt
<chr> <chr> <int>
1 2022-01-01 A 2
2 2022-01-01 B 5
3 2022-01-01 C 3
4 2022-01-01 D 7
5 2022-01-01 E 8
6 2022-01-01 X 12
7 2022-01-02 A 1
8 2022-01-02 B 9
9 2022-01-02 C 8
10 2022-01-02 D 2
11 2022-01-02 E 5
12 2022-01-02 X 11
另一种可能的解决方案,在 base R 中:
rbind(df, c(type = "X", sum(ifelse(type %in% c("B", "D"), cnt, 0))))
#> type cnt
#> 1 A 2
#> 2 B 5
#> 3 C 3
#> 4 D 7
#> 5 E 8
#> 6 X 12
与dplyr
:
bind_rows(df, list(type = "X", cnt = sum(if_else(type %in% c("B","D"), cnt, 0))))
这里有一个 dplyr
与 janitor
组合的替代方案:
df %>%
filter(type == "B" |type == "D") %>%
adorn_totals(name="X") %>%
filter(type == "X") %>%
bind_rows(df) %>%
arrange(cnt)
type cnt
A 2
C 3
B 5
D 7
E 8
X 12
我想创建一个名为“X”的新变量,它是“B”和“D”的总和
type <- c( "A", "B","C","D","E")
cnt <- c(2,5,3,7,8)
df <- data.frame(type,cnt)
> df
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
期望的输出是
> df
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
如果我们添加另一个分组变量(如日期),如何扩展它。 想每天加起来X
date <- c("2022-01-01","2022-01-01","2022-01-01","2022-01-01","2022-01-01","2022-01-02","2022-01-02","2022-01-02","2022-01-02","2022-01-02")
type <- c("A", "B","C","D","E","A", "B","C","D","E")
cnt <- c(2,5,3,7,8, 1,9,8,2,5)
df <- data.frame(date,type,cnt)
df
date type cnt
1 2022-01-01 A 2
2 2022-01-01 B 5
3 2022-01-01 C 3
4 2022-01-01 D 7
5 2022-01-01 E 8
6 2022-01-02 A 1
7 2022-01-02 B 9
8 2022-01-02 C 8
9 2022-01-02 D 2
10 2022-01-02 E 5
期望的输出是
df
date type cnt
1 2022-01-01 A 2
2 2022-01-01 B 5
3 2022-01-01 C 3
4 2022-01-01 D 7
5 2022-01-01 E 8
6 2022-01-01 X 12
7 2022-01-02 A 1
8 2022-01-02 B 9
9 2022-01-02 C 8
10 2022-01-02 D 2
11 2022-01-02 E 5
12 2022-01-02 X 11
我们可以子集和 rbind
rbind(df, data.frame(type = "X", cnt = sum(df$cnt[df$type %in% c("B", "D")])))
-输出
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
或在 dplyr
中,filter
基于 'type' 值的行,summarise
通过取 'cnt' 的 sum
,在将 'type' 创建为 'X' 并将 bind_rows
与原始数据集
library(dplyr)
df %>%
filter(type %in% c("B", "D")) %>%
summarise(type = 'X', cnt = sum(cnt)) %>%
bind_rows(df, .)
或不使用 bind_rows
df %>%
summarise(type = c(type, 'X'), cnt = c(cnt, sum(cnt[type %in% c("B", "D")])))
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
或使用complete
library(tidyr)
complete(df, type = c(type, "X"), fill = list(cnt = sum(cnt[type %in% c("B", "D")])))
# A tibble: 6 × 2
type cnt
<chr> <dbl>
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
更新
对于更新后的数据,只需添加一个group_by
df %>%
group_by(date) %>%
summarise(type = c(type, "X"),
cnt = c(cnt, sum(cnt[type %in% c("B", "D")])), .groups = 'drop')
-输出
# A tibble: 12 × 3
date type cnt
<chr> <chr> <dbl>
1 2022-01-01 A 2
2 2022-01-01 B 5
3 2022-01-01 C 3
4 2022-01-01 D 7
5 2022-01-01 E 8
6 2022-01-01 X 12
7 2022-01-02 A 1
8 2022-01-02 B 9
9 2022-01-02 C 8
10 2022-01-02 D 2
11 2022-01-02 E 5
12 2022-01-02 X 11
或使用filter
方法
df %>%
filter(type %in% c("B", "D")) %>%
group_by(date) %>%
summarise(type = 'X', cnt = sum(cnt), .groups = 'drop') %>%
bind_rows(df, .) %>%
arrange(date)
您还可以使用:
df %>%
add_row(type= 'X', cnt = sum(.$cnt[.$type %in% c('B', 'D')]))
type cnt
1 A 2
2 B 5
3 C 3
4 D 7
5 E 8
6 X 12
更新:
df %>%
group_by(date)%>%
group_modify(~add_row(.,type = 'X',
cnt = sum(.$cnt[.$type%in%c('B', 'D')])))
# A tibble: 12 x 3
# Groups: date [2]
date type cnt
<chr> <chr> <int>
1 2022-01-01 A 2
2 2022-01-01 B 5
3 2022-01-01 C 3
4 2022-01-01 D 7
5 2022-01-01 E 8
6 2022-01-01 X 12
7 2022-01-02 A 1
8 2022-01-02 B 9
9 2022-01-02 C 8
10 2022-01-02 D 2
11 2022-01-02 E 5
12 2022-01-02 X 11
另一种可能的解决方案,在 base R 中:
rbind(df, c(type = "X", sum(ifelse(type %in% c("B", "D"), cnt, 0))))
#> type cnt
#> 1 A 2
#> 2 B 5
#> 3 C 3
#> 4 D 7
#> 5 E 8
#> 6 X 12
与dplyr
:
bind_rows(df, list(type = "X", cnt = sum(if_else(type %in% c("B","D"), cnt, 0))))
这里有一个 dplyr
与 janitor
组合的替代方案:
df %>%
filter(type == "B" |type == "D") %>%
adorn_totals(name="X") %>%
filter(type == "X") %>%
bind_rows(df) %>%
arrange(cnt)
type cnt
A 2
C 3
B 5
D 7
E 8
X 12