如何通过多个分组条件替换值?
How to replace values by multiple grouping conditions?
在下面的简化示例中,您可以看到一个 ID 列(例如产品 ID)、一个包含实际日期和未来日期的日期列、一个值列和一个列,指示它是实际值还是预测值使用 ML 的值。
我的目标是用每个 ID 的 ACT 列的最后一个值和日期替换每个模型的 NA。在我的示例中,这意味着对于 id A1,将 ML1 和 ML2 中的 NA 替换为 2014-01-01 作为日期和 54 作为值。
library(tidyverse)
df <- tibble(id = c(rep("A1",11), rep("B1",11)),
Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
"2014-01-01",NA, "2015-01-01","2016-01-01",
NA, "2015-01-01","2016-01-01"),2)),
Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
c(54,41,32,65,76,NA,32,42,NA,23,76))
)
我正在寻找像 dplyr 这样没有 for 循环的管道解决方案。
这个怎么样:
library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
df <- tibble(id = c(rep("A1",11), rep("B1",11)),
Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
"2014-01-01",NA, "2015-01-01","2016-01-01",
NA, "2015-01-01","2016-01-01"),2)),
Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
c(54,41,32,65,76,NA,32,42,NA,23,76))
)
df %>%
group_by(id) %>%
filter(Model == "ACT") %>%
summarise(across(c(Date, Value), last)) %>%
rename(date_fill = Date, value_fill = Value) %>%
right_join(df) %>%
mutate(Value = case_when(Model != "Act" & is.na(Value) ~ value_fill, TRUE ~ Value),
Date = case_when(Model != "Act" & is.na(Date) ~ date_fill, TRUE ~ Date)) %>%
select(-c("date_fill", "value_fill"))
#> Joining, by = "id"
#> # A tibble: 22 × 4
#> id Model Date Value
#> <chr> <chr> <date> <dbl>
#> 1 A1 ACT 2010-01-01 11
#> 2 A1 ACT 2011-01-01 31
#> 3 A1 ACT 2012-01-01 44
#> 4 A1 ACT 2013-01-01 21
#> 5 A1 ACT 2014-01-01 54
#> 6 A1 ML1 2014-01-01 54
#> 7 A1 ML1 2015-01-01 53
#> 8 A1 ML1 2016-01-01 13
#> 9 A1 ML2 2014-01-01 54
#> 10 A1 ML2 2015-01-01 33
#> # … with 12 more rows
由 reprex package (v2.0.1)
于 2022-03-28 创建
在下面的简化示例中,您可以看到一个 ID 列(例如产品 ID)、一个包含实际日期和未来日期的日期列、一个值列和一个列,指示它是实际值还是预测值使用 ML 的值。
我的目标是用每个 ID 的 ACT 列的最后一个值和日期替换每个模型的 NA。在我的示例中,这意味着对于 id A1,将 ML1 和 ML2 中的 NA 替换为 2014-01-01 作为日期和 54 作为值。
library(tidyverse)
df <- tibble(id = c(rep("A1",11), rep("B1",11)),
Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
"2014-01-01",NA, "2015-01-01","2016-01-01",
NA, "2015-01-01","2016-01-01"),2)),
Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
c(54,41,32,65,76,NA,32,42,NA,23,76))
)
我正在寻找像 dplyr 这样没有 for 循环的管道解决方案。
这个怎么样:
library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
df <- tibble(id = c(rep("A1",11), rep("B1",11)),
Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
"2014-01-01",NA, "2015-01-01","2016-01-01",
NA, "2015-01-01","2016-01-01"),2)),
Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
c(54,41,32,65,76,NA,32,42,NA,23,76))
)
df %>%
group_by(id) %>%
filter(Model == "ACT") %>%
summarise(across(c(Date, Value), last)) %>%
rename(date_fill = Date, value_fill = Value) %>%
right_join(df) %>%
mutate(Value = case_when(Model != "Act" & is.na(Value) ~ value_fill, TRUE ~ Value),
Date = case_when(Model != "Act" & is.na(Date) ~ date_fill, TRUE ~ Date)) %>%
select(-c("date_fill", "value_fill"))
#> Joining, by = "id"
#> # A tibble: 22 × 4
#> id Model Date Value
#> <chr> <chr> <date> <dbl>
#> 1 A1 ACT 2010-01-01 11
#> 2 A1 ACT 2011-01-01 31
#> 3 A1 ACT 2012-01-01 44
#> 4 A1 ACT 2013-01-01 21
#> 5 A1 ACT 2014-01-01 54
#> 6 A1 ML1 2014-01-01 54
#> 7 A1 ML1 2015-01-01 53
#> 8 A1 ML1 2016-01-01 13
#> 9 A1 ML2 2014-01-01 54
#> 10 A1 ML2 2015-01-01 33
#> # … with 12 more rows
由 reprex package (v2.0.1)
于 2022-03-28 创建