R: data.table 按组操作转换为有条件的逻辑(以看似随机的方式)
R: data.table converts to logical in conditional by group operation (in seemingly random way)
我有以下问题:
test <- data.table(v = ceiling(runif(20, 0, 5)), g = ceiling(runif(20, 0, 2)))
setorder(test, g)
test[, (paste0("n", 1:5)) := lapply(1:5, function(x) sum(v == x)), by = g]
test[, (paste0("foo", 1:3)) := lapply(1:3, function(x){ifelse(get(paste0("n", x + 1)) != 0,
get(paste0("n", x))/get(paste0("n", x + 1)), NA)}), by = g]
test
如果您 运行 多次执行此代码,则有时“foo”变量之一会转换为逻辑变量,这毫无意义。
感谢您的帮助!
原因是使用了NA
,默认是NA_logical_
,如果根据条件只有NA
,那么它就是一个逻辑列,否则就是被强制转换为其他观察的列类型。如果我们使用 ?NA
中提到的 NA_real_
常量,这可以解决
NA is a logical constant of length 1 which contains a missing value indicator. NA can be coerced to any other vector type except raw. There are also constants NA_integer_, NA_real_, NA_complex_ and NA_character_ of the other atomic vector types which support missing values: all of these are reserved words in the R language.
test[, (paste0("foo", 1:3)) :=
lapply(1:3, function(x){
ifelse(get(paste0("n", x + 1)) != 0,
get(paste0("n", x))/get(paste0("n", x + 1)), NA_real_)}), by = g]
除了使用 ifelse
并根据列类型指定正确的 NA
之外,还可以选择使用 case_when
(来自 dplyr
)或 data.table::fcase
默认情况下 return NA(具有适当的列类型)
test[, paste0("foo", 1:3) := lapply(1:3,
function(x) fcase(.SD[[paste0("n", x + 1)]] !=0,
.SD[[paste0("n", x)]]/.SD[[paste0("n", x + 1)]])), by = g]
-测试
lst1 <- replicate(10, {
test <- data.table(v = ceiling(runif(20, 0, 5)),
g = ceiling(runif(20, 0, 2)))
setorder(test, g)
test[, (paste0("n", 1:5)) := lapply(1:5, function(x) sum(v == x)),
by = g];test[, paste0("foo", 1:3) := lapply(1:3,
function(x) fcase(.SD[[paste0("n", x + 1)]] !=0,
.SD[[paste0("n", x)]]/.SD[[paste0("n", x + 1)]])), by = g]
}, simplify = FALSE)
-只检查一个元素 NA
> lst1[[9]]
v g n1 n2 n3 n4 n5 foo1 foo2 foo3
<num> <num> <int> <int> <int> <int> <int> <num> <num> <num>
1: 4 1 3 1 0 2 4 3.00 NA 0
2: 5 1 3 1 0 2 4 3.00 NA 0
3: 1 1 3 1 0 2 4 3.00 NA 0
4: 4 1 3 1 0 2 4 3.00 NA 0
5: 5 1 3 1 0 2 4 3.00 NA 0
6: 1 1 3 1 0 2 4 3.00 NA 0
7: 5 1 3 1 0 2 4 3.00 NA 0
8: 2 1 3 1 0 2 4 3.00 NA 0
9: 1 1 3 1 0 2 4 3.00 NA 0
10: 5 1 3 1 0 2 4 3.00 NA 0
11: 2 2 1 4 0 1 4 0.25 NA 0
12: 1 2 1 4 0 1 4 0.25 NA 0
13: 2 2 1 4 0 1 4 0.25 NA 0
14: 5 2 1 4 0 1 4 0.25 NA 0
15: 5 2 1 4 0 1 4 0.25 NA 0
16: 2 2 1 4 0 1 4 0.25 NA 0
17: 5 2 1 4 0 1 4 0.25 NA 0
18: 4 2 1 4 0 1 4 0.25 NA 0
19: 2 2 1 4 0 1 4 0.25 NA 0
20: 5 2 1 4 0 1 4 0.25 NA 0
v g n1 n2 n3 n4 n5 foo1 foo2 foo3
我有以下问题:
test <- data.table(v = ceiling(runif(20, 0, 5)), g = ceiling(runif(20, 0, 2)))
setorder(test, g)
test[, (paste0("n", 1:5)) := lapply(1:5, function(x) sum(v == x)), by = g]
test[, (paste0("foo", 1:3)) := lapply(1:3, function(x){ifelse(get(paste0("n", x + 1)) != 0,
get(paste0("n", x))/get(paste0("n", x + 1)), NA)}), by = g]
test
如果您 运行 多次执行此代码,则有时“foo”变量之一会转换为逻辑变量,这毫无意义。
感谢您的帮助!
原因是使用了NA
,默认是NA_logical_
,如果根据条件只有NA
,那么它就是一个逻辑列,否则就是被强制转换为其他观察的列类型。如果我们使用 ?NA
NA_real_
常量,这可以解决
NA is a logical constant of length 1 which contains a missing value indicator. NA can be coerced to any other vector type except raw. There are also constants NA_integer_, NA_real_, NA_complex_ and NA_character_ of the other atomic vector types which support missing values: all of these are reserved words in the R language.
test[, (paste0("foo", 1:3)) :=
lapply(1:3, function(x){
ifelse(get(paste0("n", x + 1)) != 0,
get(paste0("n", x))/get(paste0("n", x + 1)), NA_real_)}), by = g]
除了使用 ifelse
并根据列类型指定正确的 NA
之外,还可以选择使用 case_when
(来自 dplyr
)或 data.table::fcase
默认情况下 return NA(具有适当的列类型)
test[, paste0("foo", 1:3) := lapply(1:3,
function(x) fcase(.SD[[paste0("n", x + 1)]] !=0,
.SD[[paste0("n", x)]]/.SD[[paste0("n", x + 1)]])), by = g]
-测试
lst1 <- replicate(10, {
test <- data.table(v = ceiling(runif(20, 0, 5)),
g = ceiling(runif(20, 0, 2)))
setorder(test, g)
test[, (paste0("n", 1:5)) := lapply(1:5, function(x) sum(v == x)),
by = g];test[, paste0("foo", 1:3) := lapply(1:3,
function(x) fcase(.SD[[paste0("n", x + 1)]] !=0,
.SD[[paste0("n", x)]]/.SD[[paste0("n", x + 1)]])), by = g]
}, simplify = FALSE)
-只检查一个元素 NA
> lst1[[9]]
v g n1 n2 n3 n4 n5 foo1 foo2 foo3
<num> <num> <int> <int> <int> <int> <int> <num> <num> <num>
1: 4 1 3 1 0 2 4 3.00 NA 0
2: 5 1 3 1 0 2 4 3.00 NA 0
3: 1 1 3 1 0 2 4 3.00 NA 0
4: 4 1 3 1 0 2 4 3.00 NA 0
5: 5 1 3 1 0 2 4 3.00 NA 0
6: 1 1 3 1 0 2 4 3.00 NA 0
7: 5 1 3 1 0 2 4 3.00 NA 0
8: 2 1 3 1 0 2 4 3.00 NA 0
9: 1 1 3 1 0 2 4 3.00 NA 0
10: 5 1 3 1 0 2 4 3.00 NA 0
11: 2 2 1 4 0 1 4 0.25 NA 0
12: 1 2 1 4 0 1 4 0.25 NA 0
13: 2 2 1 4 0 1 4 0.25 NA 0
14: 5 2 1 4 0 1 4 0.25 NA 0
15: 5 2 1 4 0 1 4 0.25 NA 0
16: 2 2 1 4 0 1 4 0.25 NA 0
17: 5 2 1 4 0 1 4 0.25 NA 0
18: 4 2 1 4 0 1 4 0.25 NA 0
19: 2 2 1 4 0 1 4 0.25 NA 0
20: 5 2 1 4 0 1 4 0.25 NA 0
v g n1 n2 n3 n4 n5 foo1 foo2 foo3