过滤列并使用 R 重复比较两列
Filter columns and repeatedly comparing two columns in pair using R
给定如下df
,对于每年的实际值和预测值,我需要检查与上一年的实际值相比,今年的实际值和预测值是否具有相同的方向:
df <- structure(list(code = c("M0000273", "M0000357", "M0000545"),
name = c("industry", "agriculture", "service"), `2019_actual` = c(16.78,
9.26, 49.38), `2019_pred` = c(17.78, 10.26, NA), `2020_actual` = c(35.74,
NA, 49.38), `2020_pred` = c(36.74, 66.56, 25.36), `2021_actual` = c(30.74,
83.42, 63.26), `2021_pred` = c(31.74, 84.42, 35.23)), class = "data.frame", row.names = c(NA,
-3L))
输出:
code name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred
1 M0000273 industry 16.78 17.78 35.74 36.74 30.74 31.74
2 M0000357 agriculture 9.26 10.26 NA 66.56 83.42 84.42
3 M0000545 service 49.38 NA 49.38 25.36 63.26 35.23
逻辑是:如果两年的差值为正、负或零,则分别为return、increase
、decrease
和unchanged
,如果其中一个或者两个值都是NA
s,那么return NA
.
预期结果:
code name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred 2020_act_direction 2020_pred_direction 2021_act_direction
1 M0000273 industry 16.78 17.78 35.74 36.74 30.74 31.74 increase increase decrease
2 M0000357 agriculture 9.26 10.26 NA 66.56 83.42 84.42 increase
3 M0000545 service 49.38 NA 49.38 25.36 63.26 35.23 unchanged decrease increase
2021_pred_direction
1 decrease
2
3 decrease
我尝试使用以下代码,但它有两个问题:1. return 一个错误:**Error: unexpected ')' in " )"**
; 2.如果我有很多年(即从2010年到2020年),这显然不是获得预期结果的正确方法。
df %>%
mutate(
`2020_act_direction` = case_when(`2020_actual` - `2019_actual` > 0 ~ 'increase',
`2020_actual` - `2019_actual` < 0 ~ 'decrease',
`2020_actual` - `2019_actual` == 0 ~ 'unchanged',
TRUE ~ NA
),
`2020_pred_direction` = case_when(`2020_pred` - `2019_actual` > 0 ~ 'increase',
`2020_pred` - `2019_actual` < 0 ~ 'decrease',
`2020_pred` - `2019_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
`2021_act_direction` = case_when(`2021_actual` - `2020_actual` > 0 ~ 'increase',
`2021_actual` - `2020_actual` < 0 ~ 'decrease',
`2021_actual` - `2020_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
`2021_pred_direction` = case_when(`2021_pred` - `2020_actual` > 0 ~ 'increase',
`2021_pred` - `2020_actual` < 0 ~ 'decrease',
`2021_pred` - `2020_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
)
我该如何处理这个问题?
使用 pivot_longer
和 pivot_wider
为每个 year/code/name 获取一行。然后你可以很容易地使用 lag
来比较连续的年份。
library(tidyverse)
df <- structure(list(code = c("M0000273", "M0000357", "M0000545"),
name = c("industry", "agriculture", "service"),
`2019_actual` = c(16.78, 9.26, 49.38),
`2019_pred` = c(17.78, 10.26, NA),
`2020_actual` = c(35.74, NA, 49.38),
`2020_pred` = c(36.74, 66.56, 25.36),
`2021_actual` = c(30.74, 83.42, 63.26),
`2021_pred` = c(31.74, 84.42, 35.23)),
class = "data.frame", row.names = c(NA, -3L)) %>%
as_tibble()
df %>%
pivot_longer(cols = c(-code, -name), names_to = c("year", "type"), names_sep = "_") %>%
pivot_wider(names_from = "type", values_from = "value") %>%
mutate(year = as.integer(year)) %>%
group_by(code, name) %>%
arrange(year) %>%
mutate(act_direction = case_when(actual > lag(actual) ~ "increase",
actual < lag(actual) ~ "decrease",
actual == lag(actual) ~ "unchanged"),
pred_direction = case_when(pred > lag(actual) ~ "increase",
pred < lag(actual) ~ "decrease",
pred == lag(actual) ~ "unchanged"))
给定如下df
,对于每年的实际值和预测值,我需要检查与上一年的实际值相比,今年的实际值和预测值是否具有相同的方向:
df <- structure(list(code = c("M0000273", "M0000357", "M0000545"),
name = c("industry", "agriculture", "service"), `2019_actual` = c(16.78,
9.26, 49.38), `2019_pred` = c(17.78, 10.26, NA), `2020_actual` = c(35.74,
NA, 49.38), `2020_pred` = c(36.74, 66.56, 25.36), `2021_actual` = c(30.74,
83.42, 63.26), `2021_pred` = c(31.74, 84.42, 35.23)), class = "data.frame", row.names = c(NA,
-3L))
输出:
code name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred
1 M0000273 industry 16.78 17.78 35.74 36.74 30.74 31.74
2 M0000357 agriculture 9.26 10.26 NA 66.56 83.42 84.42
3 M0000545 service 49.38 NA 49.38 25.36 63.26 35.23
逻辑是:如果两年的差值为正、负或零,则分别为return、increase
、decrease
和unchanged
,如果其中一个或者两个值都是NA
s,那么return NA
.
预期结果:
code name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred 2020_act_direction 2020_pred_direction 2021_act_direction
1 M0000273 industry 16.78 17.78 35.74 36.74 30.74 31.74 increase increase decrease
2 M0000357 agriculture 9.26 10.26 NA 66.56 83.42 84.42 increase
3 M0000545 service 49.38 NA 49.38 25.36 63.26 35.23 unchanged decrease increase
2021_pred_direction
1 decrease
2
3 decrease
我尝试使用以下代码,但它有两个问题:1. return 一个错误:**Error: unexpected ')' in " )"**
; 2.如果我有很多年(即从2010年到2020年),这显然不是获得预期结果的正确方法。
df %>%
mutate(
`2020_act_direction` = case_when(`2020_actual` - `2019_actual` > 0 ~ 'increase',
`2020_actual` - `2019_actual` < 0 ~ 'decrease',
`2020_actual` - `2019_actual` == 0 ~ 'unchanged',
TRUE ~ NA
),
`2020_pred_direction` = case_when(`2020_pred` - `2019_actual` > 0 ~ 'increase',
`2020_pred` - `2019_actual` < 0 ~ 'decrease',
`2020_pred` - `2019_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
`2021_act_direction` = case_when(`2021_actual` - `2020_actual` > 0 ~ 'increase',
`2021_actual` - `2020_actual` < 0 ~ 'decrease',
`2021_actual` - `2020_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
`2021_pred_direction` = case_when(`2021_pred` - `2020_actual` > 0 ~ 'increase',
`2021_pred` - `2020_actual` < 0 ~ 'decrease',
`2021_pred` - `2020_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
)
我该如何处理这个问题?
使用 pivot_longer
和 pivot_wider
为每个 year/code/name 获取一行。然后你可以很容易地使用 lag
来比较连续的年份。
library(tidyverse)
df <- structure(list(code = c("M0000273", "M0000357", "M0000545"),
name = c("industry", "agriculture", "service"),
`2019_actual` = c(16.78, 9.26, 49.38),
`2019_pred` = c(17.78, 10.26, NA),
`2020_actual` = c(35.74, NA, 49.38),
`2020_pred` = c(36.74, 66.56, 25.36),
`2021_actual` = c(30.74, 83.42, 63.26),
`2021_pred` = c(31.74, 84.42, 35.23)),
class = "data.frame", row.names = c(NA, -3L)) %>%
as_tibble()
df %>%
pivot_longer(cols = c(-code, -name), names_to = c("year", "type"), names_sep = "_") %>%
pivot_wider(names_from = "type", values_from = "value") %>%
mutate(year = as.integer(year)) %>%
group_by(code, name) %>%
arrange(year) %>%
mutate(act_direction = case_when(actual > lag(actual) ~ "increase",
actual < lag(actual) ~ "decrease",
actual == lag(actual) ~ "unchanged"),
pred_direction = case_when(pred > lag(actual) ~ "increase",
pred < lag(actual) ~ "decrease",
pred == lag(actual) ~ "unchanged"))