在 R 中使用 tidyverse 对 case_when 函数进行故障排除

troubleshooting case_when function using tidyverse in R

简单的问题,我不明白 case_when 的工作原理。在下面的示例中,我预计赛季有 4 个级别,但我只得到两个。

谢谢

data <- tibble(day = 1:366) %>% 
  mutate(
    season = case_when(
      day <= 60 | day > 335 ~ "winter",
      day > 60  | day <= 151 ~ "spring",
      day > 151 | day <= 242 ~ "summer",
      day > 242 | day <= 335 ~ "autumn"
    )
  )

表达式 2 到 4 将是 & 而不是 |。原因是 | 会因为重叠

而覆盖第一个条件中的一些值
library(dplyr)
data <- tibble(day = 1:366) %>% 
  mutate(
    season = case_when(
      day <= 60 | day > 335 ~ "winter",
      day > 60  & day <= 151 ~ "spring",
      day > 151 & day <= 242 ~ "summer",
      day > 242 & day <= 335 ~ "autumn"
    )
  )

-正在检查

> n_distinct(data$season)
[1] 4

实际上你可以稍微减少这个 case_when() 语句,因为 case_when 一旦满足一个条件就会中断。因此,如果值是 lower/equal 到 60 或大于 335,则下一个条件足以定义为低于 151:

library(dplyr)
data <- tibble(day = 1:366) %>% 
  mutate(
    season = case_when(
      day <= 60 | day > 335 ~ "winter",
      day <= 151 ~ "spring",
      day <= 242 ~ "summer",
      day <= 335 ~ "autumn"
    )
  )

您还可以使用 TRUE 案例,因为它用于不满足所有先决条件的情况:

data <- tibble(day = 1:366) %>% 
  mutate(
    season = case_when(
      day <= 60 ~ "winter",
      day <= 151 ~ "spring",
      day <= 242 ~ "summer",
      day <= 335 ~ "autumn",
      TRUE ~ "winter"
    )
  )

停止使用 case_when,改用 cut

tibble(day = 1:366) |>
     mutate(
       season = cut(day, 
                      c(0, 60, 151, 242, 335, 366),
                      c("winter", "spring", "summer", "autumn", 
                          "winter")
                )
     )