case_when 条件检查不存在的行时失败

case_when fails when condition checks for rows that don't exist

考虑以下数据:

df <- data.frame(group  = c(1, 2, 2, 2),
                 start  = c(2, 7, 7, 7),
                 stop   = c(8, 7, 8, 9),
                 unstop = c(10, 7, 9, 10))

我现在想以“如果第一行做这个,如果第二行做那个”的形式为每个组设置一个或多或少简单的 case_when。但是,我收到一个错误。我假设这是因为第 1 组只有一行,所以无法检查条件:

df |>
  group_by(group) |> 
  mutate(n_rows = n(),
         split_weeks = case_when(n_rows == 1 ~ str_c(start:stop, collapse = ","),
                                 n_rows  > 1 & row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop)), collapse = ","),
                                 TRUE ~ "fail"))

Error in `mutate()`:
! Problem while computing `split_weeks = case_when(...)`.
ℹ The error occurred in group 1: group = 1.
Caused by error in `unstop:lead(stop)`:
! NA/NaN argument
Run `rlang::last_error()` to see where the error occurred.

知道这里发生了什么吗?

我认为它与 lead 函数有关,因为如果我删除该部分,我“只会收到警告,但至少我会得到一个结果。

预期输出:

# A tibble: 4 × 6
# Groups:   group [2]
  group start  stop unstop n_rows split_weeks  
  <dbl> <dbl> <dbl>  <dbl>  <int> <chr>        
1     1     2     8     10      1 2,3,4,5,6,7,8
2     2     7     7      7      3 7,8          
3     2     7     8      9      3 fail         
4     2     7     9     10      3 fail         

我认为当您要求 R 查找 lead a) 超出 table 行末尾或 b) 组外时会出现错误。您可以将默认值 0 传递给它,该值从不使用并会抑制错误,但只会到达一半,因为该函数试图连接每组所有行中的每个 start:stopunstop:lead(stop) 值:

library(tidyverse)

df <- data.frame(group  = c(1, 2, 2, 2),
                 start  = c(2, 7, 7, 7),
                 stop   = c(8, 7, 8, 9),
                 unstop = c(10, 7, 9, 10))


df |>
  group_by(group) |>
  mutate(
    n_rows = n(),
    split_weeks = case_when(
      n_rows == 1 ~ str_c(start:stop, collapse = ","),
      n_rows  > 1 &
        row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop, default = 0)), collapse = ","),
      TRUE ~ "fail"
    )
  )
#> Warning in start:stop: numerical expression has 3 elements: only the first used

#> Warning in start:stop: numerical expression has 3 elements: only the first used

#> Warning in start:stop: numerical expression has 3 elements: only the first used

#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used

#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used
#> # A tibble: 4 × 6
#> # Groups:   group [2]
#>   group start  stop unstop n_rows split_weeks  
#>   <dbl> <dbl> <dbl>  <dbl>  <int> <chr>        
#> 1     1     2     8     10      1 2,3,4,5,6,7,8
#> 2     2     7     7      7      3 7,7          
#> 3     2     7     8      9      3 fail         
#> 4     2     7     9     10      3 fail

一种整理方法是:

  • 查找组外的潜在客户值(使用默认值以避免最后一行出现错误)
  • 查找组内的行数和行号
  • 再次取消组合!
  • 进行按行计算,以便 R 仅关注该行中的值
  • 执行连接

虽然不确定为什么要将 7,7,8 放在该单元格中,但结果如此(这是有道理的,因为它连接了 7 到 7 和 7 到 8):

df |> 
  mutate(lead_stop = lead(stop, default = 0)) |>
  group_by(group) |>
  mutate(
    n_rows = n(),
    rownum = row_number()) |>
  ungroup() |>
  rowwise() |>
  mutate(
    split_weeks = case_when(
      rownum > 1 ~ "fail",
      n_rows == 1 ~ str_c(start:stop, collapse = ","),
      n_rows  > 1 & rownum == 1 ~ str_c(c(start:stop, unstop:lead_stop), collapse = ","),
      TRUE ~ "fail"
    )
  )
#> # A tibble: 4 × 8
#> # Rowwise: 
#>   group start  stop unstop lead_stop n_rows rownum split_weeks  
#>   <dbl> <dbl> <dbl>  <dbl>     <dbl>  <int>  <int> <chr>        
#> 1     1     2     8     10         7      1      1 2,3,4,5,6,7,8
#> 2     2     7     7      7         8      3      1 7,7,8        
#> 3     2     7     8      9         9      3      2 fail         
#> 4     2     7     9     10         0      3      3 fail

reprex package (v2.0.1)

创建于 2022-05-07

这里有一个替代方案可以产生所需的输出(至少在这种情况下)。 @Andy Baxter 很好地解释了原件失败的原因;即使 case_when 使用第一种情况的结果,第二种情况也会抛出错误,因此操作失败。您可以通过使用 lead(stop, default = 0)coalesce(lead(stop), SOMETHING) 来解决这个问题,当没有“下一个”值时,其中任何一个都会产生可计算的(如果 meaningless/unneeded)结果。

df |>
  group_by(group) |> 
  mutate(n_rows = n()) %>%
  mutate(split_weeks = case_when(
    n_rows == 1 ~ str_c(start:stop, collapse = ","),
    n_rows  > 1 & row_number() == 1 ~ str_c(unstop:(lead(stop, default = 0)), collapse = ","),
    # n_rows  > 1 & row_number() == 1 ~ str_c(unstop:(coalesce(lead(stop), unstop)), collapse = ","), # Alternative
    TRUE ~ "fail"))

结果

# A tibble: 4 × 6
# Groups:   group [2]
  group start  stop unstop n_rows split_weeks  
  <dbl> <dbl> <dbl>  <dbl>  <int> <chr>        
1     1     2     8     10      1 2,3,4,5,6,7,8
2     2     7     7      7      3 7,8          
3     2     7     8      9      3 fail         
4     2     7     9     10      3 fail