过滤列并使用 R 重复比较两列

Filter columns and repeatedly comparing two columns in pair using R

给定如下df,对于每年的实际值和预测值,我需要检查与上一年的实际值相比,今年的实际值和预测值是否具有相同的方向:

df <- structure(list(code = c("M0000273", "M0000357", "M0000545"), 
    name = c("industry", "agriculture", "service"), `2019_actual` = c(16.78, 
    9.26, 49.38), `2019_pred` = c(17.78, 10.26, NA), `2020_actual` = c(35.74, 
    NA, 49.38), `2020_pred` = c(36.74, 66.56, 25.36), `2021_actual` = c(30.74, 
    83.42, 63.26), `2021_pred` = c(31.74, 84.42, 35.23)), class = "data.frame", row.names = c(NA, 
-3L))

输出:

      code        name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred
1 M0000273    industry       16.78     17.78       35.74     36.74       30.74     31.74
2 M0000357 agriculture        9.26     10.26          NA     66.56       83.42     84.42
3 M0000545     service       49.38        NA       49.38     25.36       63.26     35.23

逻辑是:如果两年的差值为正、负或零,则分别为return、increasedecreaseunchanged,如果其中一个或者两个值都是NAs,那么return NA.

预期结果:

     code        name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred 2020_act_direction 2020_pred_direction 2021_act_direction
1 M0000273    industry       16.78     17.78       35.74     36.74       30.74     31.74           increase            increase           decrease
2 M0000357 agriculture        9.26     10.26          NA     66.56       83.42     84.42                               increase                   
3 M0000545     service       49.38        NA       49.38     25.36       63.26     35.23          unchanged            decrease           increase
  2021_pred_direction
1            decrease
2                    
3            decrease

我尝试使用以下代码,但它有两个问题:1. return 一个错误:**Error: unexpected ')' in " )"**; 2.如果我有很多年(即从2010年到2020年),这显然不是获得预期结果的正确方法。

df %>% 
  mutate(
    `2020_act_direction` = case_when(`2020_actual` - `2019_actual` > 0 ~ 'increase',
                                   `2020_actual` - `2019_actual` < 0 ~ 'decrease',
                                   `2020_actual` - `2019_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
                                   ),
    `2020_pred_direction` = case_when(`2020_pred` - `2019_actual` > 0 ~ 'increase',
                                   `2020_pred` - `2019_actual` < 0 ~ 'decrease',
                                   `2020_pred` - `2019_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
    `2021_act_direction` = case_when(`2021_actual` - `2020_actual` > 0 ~ 'increase',
                                   `2021_actual` - `2020_actual` < 0 ~ 'decrease',
                                   `2021_actual` - `2020_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
    `2021_pred_direction` = case_when(`2021_pred` - `2020_actual` > 0 ~ 'increase',
                                   `2021_pred` - `2020_actual` < 0 ~ 'decrease',
                                   `2021_pred` - `2020_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
  )

我该如何处理这个问题?

使用 pivot_longerpivot_wider 为每个 year/code/name 获取一行。然后你可以很容易地使用 lag 来比较连续的年份。

library(tidyverse)

df <- structure(list(code = c("M0000273", "M0000357", "M0000545"), 
                     name = c("industry", "agriculture", "service"), 
                     `2019_actual` = c(16.78, 9.26, 49.38), 
                     `2019_pred` = c(17.78, 10.26, NA), 
                     `2020_actual` = c(35.74, NA, 49.38), 
                     `2020_pred` = c(36.74, 66.56, 25.36), 
                     `2021_actual` = c(30.74, 83.42, 63.26), 
                     `2021_pred` = c(31.74, 84.42, 35.23)), 
                class = "data.frame", row.names = c(NA, -3L)) %>% 
    as_tibble()

df %>% 
    pivot_longer(cols = c(-code, -name), names_to = c("year", "type"), names_sep = "_") %>% 
    pivot_wider(names_from = "type", values_from = "value") %>% 
    mutate(year = as.integer(year)) %>% 
    group_by(code, name) %>% 
    arrange(year) %>% 
    mutate(act_direction = case_when(actual > lag(actual) ~ "increase",
                                     actual < lag(actual) ~ "decrease",
                                     actual == lag(actual) ~ "unchanged"),
           pred_direction = case_when(pred > lag(actual) ~ "increase",
                                   pred < lag(actual) ~ "decrease",
                                   pred == lag(actual) ~ "unchanged"))