在 data.table 中使用列名范围，就像在 dplyrs select 中一样

Question

我想 select 来自 data.table 的多个列（有 1200 个列名），按列名指定一个范围，就像可以用 dplyr 做的那样，例如：

library(data.table)
library(dplyr)
dt <- data.table(w = sample(100, 50),
       x = sample(100, 50),
       y = sample(100, 50),
       z = sample(100, 50))

select(dt, w:y)

目前我正在使用以下解决方法：

cols_to_select <- names(select(dt, w:y))
dt[ ,cols_to_select, with = FALSE]

我认为使用列号的另一种选择（例如 dt[ , 1:3, with = FALSE] 可能会导致严重的错误。 select 名称的另一种选择是：

dt[ , .SD, .SDcols = cols_to_select]

要是有这样的东西就好了：

dt[ , .(w:y)]

有更好的方法吗？如果不是为什么？如果这个问题最好放在 data.table 的 github 问题上，请告诉我

Answer 1

我所要求的在 data.table (1.9.5) 的开发版本中是可能的，如 new feature 编号 17 中所示。引用：

.SDcols and with=FALSE understand colA:colB form now. That is, DT[, lapply(.SD, sum), by=V1, .SDcols=V4:V6] and DT[, V5:V7, with=FALSE] works as intended. This is quite useful for interactive use. Closes #748.

开发版安装说明here

感谢@AnandaMahto 和@Arun 指出这一点！

也使用 dplyr 中的 select_vars(names(dt), w:y) 可能比 @shadow 指出的 names(select(dt, w:y)) 更好，因为它为 select 名称提供了更多选项，更具可读性并传达了意图更明确。

在 data.table 中使用列名范围，就像在 dplyrs select 中一样

use column-name range in data.table like in dplyrs select

r

dplyr

data.table