在 R 中使用 mutate() 和 across() 创建多个新列
Creating multiple new columns using mutate() and across() in R
我想对按 ID 分组的多个列同时执行以下计算:
df <- df %>%
group_by(Id) %>%
mutate("Flows.2018.04"= Assets.2018.04 -
(Assets.2018.03 * Returns.2018.04))
数据集包含 2018.04 到 2022.02 的每个月的 Assets.YYYY.MM 和 Returns.YYYY.MM 列,我想为每个月创建一个 Flows 列。
我知道我可以对每一列都这样做:
df <- df %>%
group_by(Id) %>%
mutate("Flows.2018.04"= Assets.2018.04 -
(Assets.2018.03 * Returns.2018.04)) %>%
mutate("Flows.2018.05"= Assets.2018.05 -
(Assets.2018.04 * Returns.2018.05))
但是因为我想对 50 多列进行此计算,所以我希望有一种更优雅的方法。据我所知,使用 dplyr across() 函数应该是可能的,但我无法弄清楚如何做到这一点。
我希望将新列命名为 Flows.YYYY.MM,这会使问题进一步复杂化。我认为实现此目的的最简单方法可能是在创建列后简单地重命名它们。
我也考虑过将数据帧从宽格式转换为长格式来执行此计算,但这对我来说似乎更复杂。
对实现预期结果有什么建议吗?
请根据要求在下面找到示例数据:
library(tidyverse)
df <- data.frame(
ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))
df
ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55 5000 2345 3459 1.03 0.94
2 6F55 3000 1926 6933 0.77 1.11
3 ANE3 5870 8563 1533 1.01 0.89
4 ANE3 4098 9373 4556 0.97 1.02
5 6F55 9878 7432 9855 1.06 1.02
期望的结果是:
ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55 5000 2345 3459 1.03 0.94 -2805 1255
2 6F55 3000 1926 6933 0.77 1.11 -384 4795
3 ANE3 5870 8563 1533 1.01 0.89 2634 -6088
4 ANE3 4098 9373 4556 0.97 1.02 5398 -5004
5 6F55 9878 7432 9855 1.06 1.02 -3039 2274
这个怎么样:
library(tidyverse)
df <- data.frame(
ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))
df %>%
pivot_longer(-ID,
names_to = c(".value", "date"),
names_pattern= "(.*)\.(\d{4}\.\d{2})") %>%
arrange(ID, date) %>%
group_by(ID, date) %>%
mutate(obs = seq_along(date)) %>%
group_by(ID, obs) %>%
mutate(Flow = Assets - (lag(Assets)*Returns)) %>%
pivot_wider(names_from = "date",
values_from = c("Assets", "Returns", "Flow")) %>%
as.data.frame()
#> ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55 1 5000 2345 3459 NA
#> 2 6F55 2 3000 1926 6933 NA
#> 3 6F55 3 9878 7432 9855 NA
#> 4 ANE3 1 5870 8563 1533 NA
#> 5 ANE3 2 4098 9373 4556 NA
#> Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1 1.03 0.94 NA -2805.00 1254.70
#> 2 0.77 1.11 NA -384.00 4795.14
#> 3 1.06 1.02 NA -3038.68 2274.36
#> 4 1.01 0.89 NA 2634.30 -6088.07
#> 5 0.97 1.02 NA 5397.94 -5004.46
由 reprex package (v2.0.1)
于 2022-04-10 创建
我想对按 ID 分组的多个列同时执行以下计算:
df <- df %>%
group_by(Id) %>%
mutate("Flows.2018.04"= Assets.2018.04 -
(Assets.2018.03 * Returns.2018.04))
数据集包含 2018.04 到 2022.02 的每个月的 Assets.YYYY.MM 和 Returns.YYYY.MM 列,我想为每个月创建一个 Flows 列。
我知道我可以对每一列都这样做:
df <- df %>%
group_by(Id) %>%
mutate("Flows.2018.04"= Assets.2018.04 -
(Assets.2018.03 * Returns.2018.04)) %>%
mutate("Flows.2018.05"= Assets.2018.05 -
(Assets.2018.04 * Returns.2018.05))
但是因为我想对 50 多列进行此计算,所以我希望有一种更优雅的方法。据我所知,使用 dplyr across() 函数应该是可能的,但我无法弄清楚如何做到这一点。
我希望将新列命名为 Flows.YYYY.MM,这会使问题进一步复杂化。我认为实现此目的的最简单方法可能是在创建列后简单地重命名它们。
我也考虑过将数据帧从宽格式转换为长格式来执行此计算,但这对我来说似乎更复杂。
对实现预期结果有什么建议吗?
请根据要求在下面找到示例数据:
library(tidyverse)
df <- data.frame(
ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))
df
ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55 5000 2345 3459 1.03 0.94
2 6F55 3000 1926 6933 0.77 1.11
3 ANE3 5870 8563 1533 1.01 0.89
4 ANE3 4098 9373 4556 0.97 1.02
5 6F55 9878 7432 9855 1.06 1.02
期望的结果是:
ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55 5000 2345 3459 1.03 0.94 -2805 1255
2 6F55 3000 1926 6933 0.77 1.11 -384 4795
3 ANE3 5870 8563 1533 1.01 0.89 2634 -6088
4 ANE3 4098 9373 4556 0.97 1.02 5398 -5004
5 6F55 9878 7432 9855 1.06 1.02 -3039 2274
这个怎么样:
library(tidyverse)
df <- data.frame(
ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))
df %>%
pivot_longer(-ID,
names_to = c(".value", "date"),
names_pattern= "(.*)\.(\d{4}\.\d{2})") %>%
arrange(ID, date) %>%
group_by(ID, date) %>%
mutate(obs = seq_along(date)) %>%
group_by(ID, obs) %>%
mutate(Flow = Assets - (lag(Assets)*Returns)) %>%
pivot_wider(names_from = "date",
values_from = c("Assets", "Returns", "Flow")) %>%
as.data.frame()
#> ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55 1 5000 2345 3459 NA
#> 2 6F55 2 3000 1926 6933 NA
#> 3 6F55 3 9878 7432 9855 NA
#> 4 ANE3 1 5870 8563 1533 NA
#> 5 ANE3 2 4098 9373 4556 NA
#> Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1 1.03 0.94 NA -2805.00 1254.70
#> 2 0.77 1.11 NA -384.00 4795.14
#> 3 1.06 1.02 NA -3038.68 2274.36
#> 4 1.01 0.89 NA 2634.30 -6088.07
#> 5 0.97 1.02 NA 5397.94 -5004.46
由 reprex package (v2.0.1)
于 2022-04-10 创建