dplyr error: strange issue when combining group_by, mutate and ifelse. Is it a bug?
dplyr error: strange issue when combining group_by, mutate and ifelse. Is it a bug?
我在使用 dplyr 以及 group_by、mutate 和 ifelse 的组合时遇到奇怪的问题。考虑以下 data.frame
> df1
crawl.id group.id hits.diff
1 1 1 NA
2 1 2 NA
3 2 2 0
4 1 3 NA
5 1 3 NA
6 1 3 NA
当我使用它时如下代码
library(dplyr)
df1 %>%
group_by(group.id) %>%
mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )
出于某种原因我得到
Error: incompatible types, expecting a logical vector**
但是,删除 group_by()
或 ifelse
一切正常:
df1 %>%
mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )
crawl.id group.id hits.diff hits.consumed
1 1 1 NA NA
2 1 2 NA NA
3 2 2 0 0
4 1 3 NA NA
5 1 3 NA NA
6 1 3 NA NA
df1 %>%
group_by( group.id ) %>%
mutate( hits.consumed = -hits.diff )
crawl.id group.id hits.diff hits.consumed
1 1 1 NA NA
2 1 2 NA NA
3 2 2 0 0
4 1 3 NA NA
5 1 3 NA NA
6 1 3 NA NA
这是错误还是功能?任何人都可以复制这个吗?
group_by、mutate 和 ifelse 的特定组合使其失败有何特别之处?
我自己的研究使我来到这里:
https://github.com/hadley/dplyr/issues/464
这表明现在应该修复它。
这里是dput(df1)
:
structure(list(crawl.id = c(1, 1, 2, 1, 1, 1), group.id = structure(c(1L,
2L, 2L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"),
hits.diff = c(NA, NA, 0, NA, NA, NA)), .Names = c("crawl.id",
"group.id", "hits.diff"), row.names = c(NA, -6L), class = "data.frame")
将其全部包装在 as.numeric
中以强制输出格式,因此 NA
s(默认情况下为 logical
)不会覆盖 class输出变量:
df1 %>%
group_by(group.id) %>%
mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) )
# crawl.id group.id hits.diff hits.consumed
#1 1 1 NA NA
#2 1 2 NA NA
#3 2 2 0 0
#4 1 3 NA NA
#5 1 3 NA NA
#6 1 3 NA NA
很确定这与此处的问题相同:Custom sum function in dplyr returns inconsistent results,结果表明:
out <- df1[1:2,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "logical"
out <- df1[1:3,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "numeric"
我在使用 dplyr 以及 group_by、mutate 和 ifelse 的组合时遇到奇怪的问题。考虑以下 data.frame
> df1
crawl.id group.id hits.diff
1 1 1 NA
2 1 2 NA
3 2 2 0
4 1 3 NA
5 1 3 NA
6 1 3 NA
当我使用它时如下代码
library(dplyr)
df1 %>%
group_by(group.id) %>%
mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )
出于某种原因我得到
Error: incompatible types, expecting a logical vector**
但是,删除 group_by()
或 ifelse
一切正常:
df1 %>%
mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )
crawl.id group.id hits.diff hits.consumed
1 1 1 NA NA
2 1 2 NA NA
3 2 2 0 0
4 1 3 NA NA
5 1 3 NA NA
6 1 3 NA NA
df1 %>%
group_by( group.id ) %>%
mutate( hits.consumed = -hits.diff )
crawl.id group.id hits.diff hits.consumed
1 1 1 NA NA
2 1 2 NA NA
3 2 2 0 0
4 1 3 NA NA
5 1 3 NA NA
6 1 3 NA NA
这是错误还是功能?任何人都可以复制这个吗? group_by、mutate 和 ifelse 的特定组合使其失败有何特别之处?
我自己的研究使我来到这里: https://github.com/hadley/dplyr/issues/464 这表明现在应该修复它。
这里是dput(df1)
:
structure(list(crawl.id = c(1, 1, 2, 1, 1, 1), group.id = structure(c(1L,
2L, 2L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"),
hits.diff = c(NA, NA, 0, NA, NA, NA)), .Names = c("crawl.id",
"group.id", "hits.diff"), row.names = c(NA, -6L), class = "data.frame")
将其全部包装在 as.numeric
中以强制输出格式,因此 NA
s(默认情况下为 logical
)不会覆盖 class输出变量:
df1 %>%
group_by(group.id) %>%
mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) )
# crawl.id group.id hits.diff hits.consumed
#1 1 1 NA NA
#2 1 2 NA NA
#3 2 2 0 0
#4 1 3 NA NA
#5 1 3 NA NA
#6 1 3 NA NA
很确定这与此处的问题相同:Custom sum function in dplyr returns inconsistent results,结果表明:
out <- df1[1:2,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "logical"
out <- df1[1:3,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "numeric"