case_when 条件检查不存在的行时失败
case_when fails when condition checks for rows that don't exist
考虑以下数据:
df <- data.frame(group = c(1, 2, 2, 2),
start = c(2, 7, 7, 7),
stop = c(8, 7, 8, 9),
unstop = c(10, 7, 9, 10))
我现在想以“如果第一行做这个,如果第二行做那个”的形式为每个组设置一个或多或少简单的 case_when。但是,我收到一个错误。我假设这是因为第 1 组只有一行,所以无法检查条件:
df |>
group_by(group) |>
mutate(n_rows = n(),
split_weeks = case_when(n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop)), collapse = ","),
TRUE ~ "fail"))
Error in `mutate()`:
! Problem while computing `split_weeks = case_when(...)`.
ℹ The error occurred in group 1: group = 1.
Caused by error in `unstop:lead(stop)`:
! NA/NaN argument
Run `rlang::last_error()` to see where the error occurred.
知道这里发生了什么吗?
我认为它与 lead
函数有关,因为如果我删除该部分,我“只会收到警告,但至少我会得到一个结果。
预期输出:
# A tibble: 4 × 6
# Groups: group [2]
group start stop unstop n_rows split_weeks
<dbl> <dbl> <dbl> <dbl> <int> <chr>
1 1 2 8 10 1 2,3,4,5,6,7,8
2 2 7 7 7 3 7,8
3 2 7 8 9 3 fail
4 2 7 9 10 3 fail
我认为当您要求 R 查找 lead
a) 超出 table 行末尾或 b) 组外时会出现错误。您可以将默认值 0 传递给它,该值从不使用并会抑制错误,但只会到达一半,因为该函数试图连接每组所有行中的每个 start:stop
和 unstop:lead(stop)
值:
library(tidyverse)
df <- data.frame(group = c(1, 2, 2, 2),
start = c(2, 7, 7, 7),
stop = c(8, 7, 8, 9),
unstop = c(10, 7, 9, 10))
df |>
group_by(group) |>
mutate(
n_rows = n(),
split_weeks = case_when(
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 &
row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop, default = 0)), collapse = ","),
TRUE ~ "fail"
)
)
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used
#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used
#> # A tibble: 4 × 6
#> # Groups: group [2]
#> group start stop unstop n_rows split_weeks
#> <dbl> <dbl> <dbl> <dbl> <int> <chr>
#> 1 1 2 8 10 1 2,3,4,5,6,7,8
#> 2 2 7 7 7 3 7,7
#> 3 2 7 8 9 3 fail
#> 4 2 7 9 10 3 fail
一种整理方法是:
- 查找组外的潜在客户值(使用默认值以避免最后一行出现错误)
- 查找组内的行数和行号
- 再次取消组合!
- 进行按行计算,以便 R 仅关注该行中的值
- 执行连接
虽然不确定为什么要将 7,7,8
放在该单元格中,但结果如此(这是有道理的,因为它连接了 7 到 7 和 7 到 8):
df |>
mutate(lead_stop = lead(stop, default = 0)) |>
group_by(group) |>
mutate(
n_rows = n(),
rownum = row_number()) |>
ungroup() |>
rowwise() |>
mutate(
split_weeks = case_when(
rownum > 1 ~ "fail",
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & rownum == 1 ~ str_c(c(start:stop, unstop:lead_stop), collapse = ","),
TRUE ~ "fail"
)
)
#> # A tibble: 4 × 8
#> # Rowwise:
#> group start stop unstop lead_stop n_rows rownum split_weeks
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr>
#> 1 1 2 8 10 7 1 1 2,3,4,5,6,7,8
#> 2 2 7 7 7 8 3 1 7,7,8
#> 3 2 7 8 9 9 3 2 fail
#> 4 2 7 9 10 0 3 3 fail
由 reprex package (v2.0.1)
创建于 2022-05-07
这里有一个替代方案可以产生所需的输出(至少在这种情况下)。 @Andy Baxter 很好地解释了原件失败的原因;即使 case_when 使用第一种情况的结果,第二种情况也会抛出错误,因此操作失败。您可以通过使用 lead(stop, default = 0)
或 coalesce(lead(stop), SOMETHING)
来解决这个问题,当没有“下一个”值时,其中任何一个都会产生可计算的(如果 meaningless/unneeded)结果。
df |>
group_by(group) |>
mutate(n_rows = n()) %>%
mutate(split_weeks = case_when(
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & row_number() == 1 ~ str_c(unstop:(lead(stop, default = 0)), collapse = ","),
# n_rows > 1 & row_number() == 1 ~ str_c(unstop:(coalesce(lead(stop), unstop)), collapse = ","), # Alternative
TRUE ~ "fail"))
结果
# A tibble: 4 × 6
# Groups: group [2]
group start stop unstop n_rows split_weeks
<dbl> <dbl> <dbl> <dbl> <int> <chr>
1 1 2 8 10 1 2,3,4,5,6,7,8
2 2 7 7 7 3 7,8
3 2 7 8 9 3 fail
4 2 7 9 10 3 fail
考虑以下数据:
df <- data.frame(group = c(1, 2, 2, 2),
start = c(2, 7, 7, 7),
stop = c(8, 7, 8, 9),
unstop = c(10, 7, 9, 10))
我现在想以“如果第一行做这个,如果第二行做那个”的形式为每个组设置一个或多或少简单的 case_when。但是,我收到一个错误。我假设这是因为第 1 组只有一行,所以无法检查条件:
df |>
group_by(group) |>
mutate(n_rows = n(),
split_weeks = case_when(n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop)), collapse = ","),
TRUE ~ "fail"))
Error in `mutate()`:
! Problem while computing `split_weeks = case_when(...)`.
ℹ The error occurred in group 1: group = 1.
Caused by error in `unstop:lead(stop)`:
! NA/NaN argument
Run `rlang::last_error()` to see where the error occurred.
知道这里发生了什么吗?
我认为它与 lead
函数有关,因为如果我删除该部分,我“只会收到警告,但至少我会得到一个结果。
预期输出:
# A tibble: 4 × 6
# Groups: group [2]
group start stop unstop n_rows split_weeks
<dbl> <dbl> <dbl> <dbl> <int> <chr>
1 1 2 8 10 1 2,3,4,5,6,7,8
2 2 7 7 7 3 7,8
3 2 7 8 9 3 fail
4 2 7 9 10 3 fail
我认为当您要求 R 查找 lead
a) 超出 table 行末尾或 b) 组外时会出现错误。您可以将默认值 0 传递给它,该值从不使用并会抑制错误,但只会到达一半,因为该函数试图连接每组所有行中的每个 start:stop
和 unstop:lead(stop)
值:
library(tidyverse)
df <- data.frame(group = c(1, 2, 2, 2),
start = c(2, 7, 7, 7),
stop = c(8, 7, 8, 9),
unstop = c(10, 7, 9, 10))
df |>
group_by(group) |>
mutate(
n_rows = n(),
split_weeks = case_when(
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 &
row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop, default = 0)), collapse = ","),
TRUE ~ "fail"
)
)
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used
#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used
#> # A tibble: 4 × 6
#> # Groups: group [2]
#> group start stop unstop n_rows split_weeks
#> <dbl> <dbl> <dbl> <dbl> <int> <chr>
#> 1 1 2 8 10 1 2,3,4,5,6,7,8
#> 2 2 7 7 7 3 7,7
#> 3 2 7 8 9 3 fail
#> 4 2 7 9 10 3 fail
一种整理方法是:
- 查找组外的潜在客户值(使用默认值以避免最后一行出现错误)
- 查找组内的行数和行号
- 再次取消组合!
- 进行按行计算,以便 R 仅关注该行中的值
- 执行连接
虽然不确定为什么要将 7,7,8
放在该单元格中,但结果如此(这是有道理的,因为它连接了 7 到 7 和 7 到 8):
df |>
mutate(lead_stop = lead(stop, default = 0)) |>
group_by(group) |>
mutate(
n_rows = n(),
rownum = row_number()) |>
ungroup() |>
rowwise() |>
mutate(
split_weeks = case_when(
rownum > 1 ~ "fail",
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & rownum == 1 ~ str_c(c(start:stop, unstop:lead_stop), collapse = ","),
TRUE ~ "fail"
)
)
#> # A tibble: 4 × 8
#> # Rowwise:
#> group start stop unstop lead_stop n_rows rownum split_weeks
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr>
#> 1 1 2 8 10 7 1 1 2,3,4,5,6,7,8
#> 2 2 7 7 7 8 3 1 7,7,8
#> 3 2 7 8 9 9 3 2 fail
#> 4 2 7 9 10 0 3 3 fail
由 reprex package (v2.0.1)
创建于 2022-05-07这里有一个替代方案可以产生所需的输出(至少在这种情况下)。 @Andy Baxter 很好地解释了原件失败的原因;即使 case_when 使用第一种情况的结果,第二种情况也会抛出错误,因此操作失败。您可以通过使用 lead(stop, default = 0)
或 coalesce(lead(stop), SOMETHING)
来解决这个问题,当没有“下一个”值时,其中任何一个都会产生可计算的(如果 meaningless/unneeded)结果。
df |>
group_by(group) |>
mutate(n_rows = n()) %>%
mutate(split_weeks = case_when(
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & row_number() == 1 ~ str_c(unstop:(lead(stop, default = 0)), collapse = ","),
# n_rows > 1 & row_number() == 1 ~ str_c(unstop:(coalesce(lead(stop), unstop)), collapse = ","), # Alternative
TRUE ~ "fail"))
结果
# A tibble: 4 × 6
# Groups: group [2]
group start stop unstop n_rows split_weeks
<dbl> <dbl> <dbl> <dbl> <int> <chr>
1 1 2 8 10 1 2,3,4,5,6,7,8
2 2 7 7 7 3 7,8
3 2 7 8 9 3 fail
4 2 7 9 10 3 fail