在 R 中使用 dplyr 时,如何合并对同一对象操作的 2 个单独的 mutate 语句?
In using dplyr in R, how to merge 2 separate mutate statements operating on the same object?
下面的 MWE 代码按预期工作。总结:
- 第一个
data1 <- ...mutate(...)
添加一个新列“minusD”,计算公式为 (i) 当前行“plusB”值 + (ii) 如果 id 在从一行到下一行(否则为 0),并且
- 第二个
data1 <- ...mutate(...)
添加一个“running_balance”列,为共享相同 ID 的所有行计算 cumsum()
。
然而,当在更完整的代码中部署它时,当 运行 另一个 table 从这个“data1”数据框的等价物中提取时,我得到一个错误,由于运行 两个 data1 <- ...
进程。那么,如何将这 2 个功能合并为一个?
带计算说明的输出:
id plusA plusB minusC minusD running_balance [explain calculations ...]
1 3 5 10 5 -7 minus D = plusB, running bal = plusA + plusB - minusC - minusD
2 4 5 9 5 -5 same formulas as above since id <> prior row id
3 8 5 8 5 0 same formulas as above since id <> prior row id
3 1 4 7 9 -11 since id = prior row id, minus D = plusB + prior row plus B, and running bal = running bal from prior row + plusA + plusB - minusC - minusD
3 2 5 6 9 -19 same formulas as above since id = prior row id
5 3 6 5 6 -2 minus D = plusB, running bal = plusA + plusB - minusC - minusD
MWE 代码:
data <- data.frame(id=c(1,2,3,3,3,5),
plusA=c(3,4,8,1,2,3),
plusB=c(5,5,5,4,5,6),
minusC = c(10,9,8,7,6,5))
library(dplyr)
data1<- subset(
data %>% mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0)) %>%
mutate(minusD=plusB+extra),
select = -c(extra) # remove temporary calculation column
)
data1 <- data1 %>% group_by(id) %>% mutate(running_balance = cumsum(plusA + plusB - minusC - minusD))
您可以使用 %>%
继续链,而不是创建临时对象。
library(dplyr)
data %>%
mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0),
minusD=plusB+extra) %>%
group_by(id) %>%
mutate(running_balance = cumsum(plusA + plusB - minusC - minusD)) %>%
ungroup %>%
select(-extra)
# id plusA plusB minusC minusD running_balance
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 3 5 10 5 -7
#2 2 4 5 9 5 -5
#3 3 8 5 8 5 0
#4 3 1 4 7 9 -11
#5 3 2 5 6 9 -19
#6 5 3 6 5 6 -2
下面的 MWE 代码按预期工作。总结:
- 第一个
data1 <- ...mutate(...)
添加一个新列“minusD”,计算公式为 (i) 当前行“plusB”值 + (ii) 如果 id 在从一行到下一行(否则为 0),并且 - 第二个
data1 <- ...mutate(...)
添加一个“running_balance”列,为共享相同 ID 的所有行计算cumsum()
。
然而,当在更完整的代码中部署它时,当 运行 另一个 table 从这个“data1”数据框的等价物中提取时,我得到一个错误,由于运行 两个 data1 <- ...
进程。那么,如何将这 2 个功能合并为一个?
带计算说明的输出:
id plusA plusB minusC minusD running_balance [explain calculations ...]
1 3 5 10 5 -7 minus D = plusB, running bal = plusA + plusB - minusC - minusD
2 4 5 9 5 -5 same formulas as above since id <> prior row id
3 8 5 8 5 0 same formulas as above since id <> prior row id
3 1 4 7 9 -11 since id = prior row id, minus D = plusB + prior row plus B, and running bal = running bal from prior row + plusA + plusB - minusC - minusD
3 2 5 6 9 -19 same formulas as above since id = prior row id
5 3 6 5 6 -2 minus D = plusB, running bal = plusA + plusB - minusC - minusD
MWE 代码:
data <- data.frame(id=c(1,2,3,3,3,5),
plusA=c(3,4,8,1,2,3),
plusB=c(5,5,5,4,5,6),
minusC = c(10,9,8,7,6,5))
library(dplyr)
data1<- subset(
data %>% mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0)) %>%
mutate(minusD=plusB+extra),
select = -c(extra) # remove temporary calculation column
)
data1 <- data1 %>% group_by(id) %>% mutate(running_balance = cumsum(plusA + plusB - minusC - minusD))
您可以使用 %>%
继续链,而不是创建临时对象。
library(dplyr)
data %>%
mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0),
minusD=plusB+extra) %>%
group_by(id) %>%
mutate(running_balance = cumsum(plusA + plusB - minusC - minusD)) %>%
ungroup %>%
select(-extra)
# id plusA plusB minusC minusD running_balance
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 3 5 10 5 -7
#2 2 4 5 9 5 -5
#3 3 8 5 8 5 0
#4 3 1 4 7 9 -11
#5 3 2 5 6 9 -19
#6 5 3 6 5 6 -2