data.table的列拆分时如何使用字符向量的R对象作为列名?

How to use R object of character vector as column names when splitting columns of a data.table?

我是 data.table 的新手,我正在尝试学习它并尝试从 data.frame 转移到 data.table。

现在,我正在尝试将文本拆分为新的列,并且我正在关注讨论 here

这就是我想要做的。

这是一个示例数据:

# sample data frame
test <- data.table(POS = c(254, 280, 303,  22, 105, 173, 230, 235, 257, 258),
               value = c("0/1:15:3:123:12:478:-38.8484,0,-6.94934",
                         "0/0:15:15:577:0:0:0,-4.51545,-52.25",
                         "0/0:13:13:276:0:0:0,-3.91339,-25.0455",
                         "0/0:367:347:13643:0:0:0,-104.457,-1226.73",
                         "0/0:367:344:13145:5,0,1,0:168,0,41,0:0,-89.9158,-1166.99,-103.554,-1168.49,-1182.1,-100.161,-1165.11,-1178.71,-1178.41,-103.554,-1168.49,-1182.1,-1178.71,-1182.1",
                         "0/1:344:180:5411:156:4394:-294.227,0,-385.695",
                         "0/0:352:349:12289:1:12:0,-104.28,-1104.15",
                         "0/0:352:345:10691:1:12:0,-103.081,-960.583",
                         "0/0:352:351:13162:1:41:0,-101.868,-1179.6",
                         "0/0:352:349:12593:0:0:0,-105.059,-1132.45"))  

我想使用带有特定列名的“:”将值拆分到不同的列中。下面的代码(我从上面的 link 中学到的)完美地做到了这一点。

test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":", 
fixed=TRUE)]

但是,是否可以使用 R 对象代替上面的 c(names)?像这样:

# new column names
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")

然后,使用如下所示的 namesForm:

# use the namesForm as column names
test[, namesForm := tstrsplit(value, ":", fixed=TRUE)]

这给了我警告和不同的输出(给了我一个 data.table 的 3 个变量;最后一个是 10 个列表,从 tstrsplit 输出中回收了 7 个列表)

Warning message:
In `[.data.table`(test, , `:=`(namesForm, tstrsplit(value, ":",  :
Supplied 7 items to be assigned to 10 items of column 'namesForm' (recycled leaving remainder of 3 items).

所以我的问题又是,是否可以使用 R object/variable 代替显式 c()?

您可以使用 (namesForm) := 而不是 namesForm :=

示例:

test2 <- copy(test)
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")

str(test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame':    10 obs. of  9 variables:
#  $ POS  : num  254 280 303 22 105 173 230 235 257 258
#  $ value: chr  "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
#  $ GT   : chr  "0/1" "0/0" "0/0" "0/0" ...
#  $ DP   : chr  "15" "15" "13" "367" ...
#  $ RO   : chr  "3" "15" "13" "347" ...
#  $ QR   : chr  "123" "577" "276" "13643" ...
#  $ AO   : chr  "12" "0" "0" "0" ...
#  $ QA   : chr  "478" "0" "0" "0" ...
#  $ GL   : chr  "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
#  - attr(*, ".internal.selfref")=<externalptr> 

str(test2[, (namesForm) := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame':    10 obs. of  9 variables:
#  $ POS  : num  254 280 303 22 105 173 230 235 257 258
#  $ value: chr  "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
#  $ GT   : chr  "0/1" "0/0" "0/0" "0/0" ...
#  $ DP   : chr  "15" "15" "13" "367" ...
#  $ RO   : chr  "3" "15" "13" "347" ...
#  $ QR   : chr  "123" "577" "276" "13643" ...
#  $ AO   : chr  "12" "0" "0" "0" ...
#  $ QA   : chr  "478" "0" "0" "0" ...
#  $ GL   : chr  "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
#  - attr(*, ".internal.selfref")=<externalptr>