Rbind-ing data.tables 与 NA 值

Question

我有一个很大的 data.table，大约有 40 列，我需要添加一条记录，而我只有 40 列中的 3 列（其余的只有 NA）。做一个可重现的例子：

require(data.table)
data(iris)
setDT(iris)

# this works (and is the expected result):
rbind(iris, list(6, NA, NA, NA, "test"))

问题是我有 37+ 个空列（我要输入的数据在变量的第 1、2 和 37 列）。所以，我需要 rep 一些 NA。但如果我尝试：

rbind(iris, list(6, rep(NA, 3), "test"))

这行不通（尺寸不同）。我可以

rbind(iris, list(c(6, rep(NA, 3), "test")))

但它（显然）会将整个第一列强制转换为字符。我已经尝试取消列出列表，反转 list(c( 序列（它只接受列表），但还没有找到任何东西。

请注意，这不是关于 rbind data.tables 的（几篇）帖子的副本，因为我能够做到这一点。我无法做到的是在使用 rep(NA, x).

和的同时维护正确的数据类

Answer 1

你可以做到...

rbind(data.table(iris), c(list(6), logical(3), list("test")))

     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica
151:          6.0          NA           NA          NA      test

logical(n) 等同于 rep(NA, n)。我将 iris 包装在 data.table() 中，因此使用 rbindlist 而不是 rbind.data.frame 并且 "test" 被视为新的因子水平而不是无效水平。

不过，我认为还有更好的方法，比如...

newrow = setDT(iris[NA_integer_, ])
newrow[, `:=`(Sepal.Length = 6, Species = factor("test")) ]
rbind(data.table(iris), newrow)

# or

rbind(data.table(iris), list(Sepal.Length = 6, Species = "test"), fill=TRUE)

这些方法更清晰，不需要摆弄列计数。

我更喜欢 newrow 方式，因为它留下 table 我可以检查以查看数据转换。

Answer 2

我们可以使用replicate

rbind(iris, c(6, replicate(3, NA, simplify = FALSE), "test"))
# Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#  1:          5.1         3.5          1.4         0.2    setosa
#  2:          4.9         3.0          1.4         0.2    setosa
#  3:          4.7         3.2          1.3         0.2    setosa
#  4:          4.6         3.1          1.5         0.2    setosa
#  5:          5.0         3.6          1.4         0.2    setosa
# ---                                                            
#147:          6.3         2.5          5.0         1.9 virginica
#148:          6.5         3.0          5.2         2.0 virginica
#149:          6.2         3.4          5.4         2.3 virginica
#150:          5.9         3.0          5.1         1.8 virginica
#151:          6.0          NA           NA          NA      test

或者正如@Frank 评论的那样

rbind(iris, c(6, as.list(rep(NA, 3)), "test"))

Rbind-ing data.tables 与 NA 值

Rbind-ing data.tables with NA values

r

rbind

data.table