数据 table 分配无效
Data table assignment not working
好的,所以我正在清理一个大型数据集,并试图通过将数据帧代码更改为数据 table 来加快速度。我在对缺失值代码进行条件分配时遇到问题。玩具示例:
X = data.table(grp=c("a","a","b","b","b","c","c","d","d","d","d"),
foo=c(1:4,NA,6:7,NA,8:10))
setkey(X,grp)
err.code <-"1111"
row.select <- row.names(X)[X$grp=="b" & is.na(X$foo)]
# Replace missing value for group b with err.code
X[row.select, foo:=err.code]
所以我想将 err.code 放入符合条件的特定单元格中。然而上面没有指定任何东西。例如
> X
grp foo
1: a 1
2: a 2
3: b 3
4: b 4
5: b NA
6: c 6
7: c 7
8: d NA
9: d 8
10: d 9
11: d 10
我在这里错过了什么?
我看到两个问题:
- 您正在尝试用字符替换数字列中的值。
data.table
不喜欢这样,除非您显式转换列类型以相互匹配。
- 您正在尝试按字符值“5”而不是数值 5 为行编制索引。
因此,以下应该有效:
err.code <- 1111
row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
X[row.select, foo := err.code][]
# grp foo
# 1: a 1
# 2: a 2
# 3: b 3
# 4: b 4
# 5: b 1111
# 6: c 6
# 7: c 7
# 8: d NA
# 9: d 8
# 10: d 9
# 11: d 10
或者,不创建那些额外的变量:
X[grp == "b" & is.na(foo), foo := 1111]
如果您认为不同的列类型会成为问题,您需要先明确转换它们:
err.code <- "1111"
row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
X[, foo := as.character(foo)][row.select, foo := err.code][]
# grp foo
# 1: a 1
# 2: a 2
# 3: b 3
# 4: b 4
# 5: b 1111
# 6: c 6
# 7: c 7
# 8: d NA
# 9: d 8
# 10: d 9
# 11: d 10
str(.Last.value)
# Classes ‘data.table’ and 'data.frame': 11 obs. of 2 variables:
# $ grp: chr "a" "a" "b" "b" ...
# $ foo: chr "1" "2" "3" "4" ...
# - attr(*, ".internal.selfref")=<externalptr>
# - attr(*, "sorted")= chr "grp"
好的,所以我正在清理一个大型数据集,并试图通过将数据帧代码更改为数据 table 来加快速度。我在对缺失值代码进行条件分配时遇到问题。玩具示例:
X = data.table(grp=c("a","a","b","b","b","c","c","d","d","d","d"),
foo=c(1:4,NA,6:7,NA,8:10))
setkey(X,grp)
err.code <-"1111"
row.select <- row.names(X)[X$grp=="b" & is.na(X$foo)]
# Replace missing value for group b with err.code
X[row.select, foo:=err.code]
所以我想将 err.code 放入符合条件的特定单元格中。然而上面没有指定任何东西。例如
> X
grp foo
1: a 1
2: a 2
3: b 3
4: b 4
5: b NA
6: c 6
7: c 7
8: d NA
9: d 8
10: d 9
11: d 10
我在这里错过了什么?
我看到两个问题:
- 您正在尝试用字符替换数字列中的值。
data.table
不喜欢这样,除非您显式转换列类型以相互匹配。 - 您正在尝试按字符值“5”而不是数值 5 为行编制索引。
因此,以下应该有效:
err.code <- 1111
row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
X[row.select, foo := err.code][]
# grp foo
# 1: a 1
# 2: a 2
# 3: b 3
# 4: b 4
# 5: b 1111
# 6: c 6
# 7: c 7
# 8: d NA
# 9: d 8
# 10: d 9
# 11: d 10
或者,不创建那些额外的变量:
X[grp == "b" & is.na(foo), foo := 1111]
如果您认为不同的列类型会成为问题,您需要先明确转换它们:
err.code <- "1111"
row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
X[, foo := as.character(foo)][row.select, foo := err.code][]
# grp foo
# 1: a 1
# 2: a 2
# 3: b 3
# 4: b 4
# 5: b 1111
# 6: c 6
# 7: c 7
# 8: d NA
# 9: d 8
# 10: d 9
# 11: d 10
str(.Last.value)
# Classes ‘data.table’ and 'data.frame': 11 obs. of 2 variables:
# $ grp: chr "a" "a" "b" "b" ...
# $ foo: chr "1" "2" "3" "4" ...
# - attr(*, ".internal.selfref")=<externalptr>
# - attr(*, "sorted")= chr "grp"