case_when 条件检查不存在的行时失败

Question

考虑以下数据：

df <- data.frame(group  = c(1, 2, 2, 2),
                 start  = c(2, 7, 7, 7),
                 stop   = c(8, 7, 8, 9),
                 unstop = c(10, 7, 9, 10))

我现在想以“如果第一行做这个，如果第二行做那个”的形式为每个组设置一个或多或少简单的 case_when。但是，我收到一个错误。我假设这是因为第 1 组只有一行，所以无法检查条件：

df |>
  group_by(group) |> 
  mutate(n_rows = n(),
         split_weeks = case_when(n_rows == 1 ~ str_c(start:stop, collapse = ","),
                                 n_rows  > 1 & row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop)), collapse = ","),
                                 TRUE ~ "fail"))

Error in `mutate()`:
! Problem while computing `split_weeks = case_when(...)`.
ℹ The error occurred in group 1: group = 1.
Caused by error in `unstop:lead(stop)`:
! NA/NaN argument
Run `rlang::last_error()` to see where the error occurred.

知道这里发生了什么吗？

我认为它与 lead 函数有关，因为如果我删除该部分，我“只会收到警告，但至少我会得到一个结果。

预期输出：

# A tibble: 4 × 6
# Groups:   group [2]
  group start  stop unstop n_rows split_weeks  
  <dbl> <dbl> <dbl>  <dbl>  <int> <chr>        
1     1     2     8     10      1 2,3,4,5,6,7,8
2     2     7     7      7      3 7,8          
3     2     7     8      9      3 fail         
4     2     7     9     10      3 fail

Answer 1

我认为当您要求 R 查找 lead a) 超出 table 行末尾或 b) 组外时会出现错误。您可以将默认值 0 传递给它，该值从不使用并会抑制错误，但只会到达一半，因为该函数试图连接每组所有行中的每个 start:stop 和 unstop:lead(stop) 值：

library(tidyverse)

df <- data.frame(group  = c(1, 2, 2, 2),
                 start  = c(2, 7, 7, 7),
                 stop   = c(8, 7, 8, 9),
                 unstop = c(10, 7, 9, 10))


df |>
  group_by(group) |>
  mutate(
    n_rows = n(),
    split_weeks = case_when(
      n_rows == 1 ~ str_c(start:stop, collapse = ","),
      n_rows  > 1 &
        row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop, default = 0)), collapse = ","),
      TRUE ~ "fail"
    )
  )
#> Warning in start:stop: numerical expression has 3 elements: only the first used

#> Warning in start:stop: numerical expression has 3 elements: only the first used

#> Warning in start:stop: numerical expression has 3 elements: only the first used

#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used

#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used
#> # A tibble: 4 × 6
#> # Groups:   group [2]
#>   group start  stop unstop n_rows split_weeks  
#>   <dbl> <dbl> <dbl>  <dbl>  <int> <chr>        
#> 1     1     2     8     10      1 2,3,4,5,6,7,8
#> 2     2     7     7      7      3 7,7          
#> 3     2     7     8      9      3 fail         
#> 4     2     7     9     10      3 fail

一种整理方法是：

查找组外的潜在客户值（使用默认值以避免最后一行出现错误）
查找组内的行数和行号
再次取消组合！
进行按行计算，以便 R 仅关注该行中的值
执行连接

虽然不确定为什么要将 7,7,8 放在该单元格中，但结果如此（这是有道理的，因为它连接了 7 到 7 和 7 到 8）：

df |> 
  mutate(lead_stop = lead(stop, default = 0)) |>
  group_by(group) |>
  mutate(
    n_rows = n(),
    rownum = row_number()) |>
  ungroup() |>
  rowwise() |>
  mutate(
    split_weeks = case_when(
      rownum > 1 ~ "fail",
      n_rows == 1 ~ str_c(start:stop, collapse = ","),
      n_rows  > 1 & rownum == 1 ~ str_c(c(start:stop, unstop:lead_stop), collapse = ","),
      TRUE ~ "fail"
    )
  )
#> # A tibble: 4 × 8
#> # Rowwise: 
#>   group start  stop unstop lead_stop n_rows rownum split_weeks  
#>   <dbl> <dbl> <dbl>  <dbl>     <dbl>  <int>  <int> <chr>        
#> 1     1     2     8     10         7      1      1 2,3,4,5,6,7,8
#> 2     2     7     7      7         8      3      1 7,7,8        
#> 3     2     7     8      9         9      3      2 fail         
#> 4     2     7     9     10         0      3      3 fail

^{由 reprex package (v2.0.1)}

创建于 2022-05-07

Answer 2

这里有一个替代方案可以产生所需的输出（至少在这种情况下）。 @Andy Baxter 很好地解释了原件失败的原因；即使 case_when 使用第一种情况的结果，第二种情况也会抛出错误，因此操作失败。您可以通过使用 lead(stop, default = 0) 或 coalesce(lead(stop), SOMETHING) 来解决这个问题，当没有“下一个”值时，其中任何一个都会产生可计算的（如果 meaningless/unneeded）结果。

df |>
  group_by(group) |> 
  mutate(n_rows = n()) %>%
  mutate(split_weeks = case_when(
    n_rows == 1 ~ str_c(start:stop, collapse = ","),
    n_rows  > 1 & row_number() == 1 ~ str_c(unstop:(lead(stop, default = 0)), collapse = ","),
    # n_rows  > 1 & row_number() == 1 ~ str_c(unstop:(coalesce(lead(stop), unstop)), collapse = ","), # Alternative
    TRUE ~ "fail"))

结果

# A tibble: 4 × 6
# Groups:   group [2]
  group start  stop unstop n_rows split_weeks  
  <dbl> <dbl> <dbl>  <dbl>  <int> <chr>        
1     1     2     8     10      1 2,3,4,5,6,7,8
2     2     7     7      7      3 7,8          
3     2     7     8      9      3 fail         
4     2     7     9     10      3 fail

case_when 条件检查不存在的行时失败

case_when fails when condition checks for rows that don't exist

r

tidyverse