data.table的列拆分时如何使用字符向量的R对象作为列名?
How to use R object of character vector as column names when splitting columns of a data.table?
我是 data.table 的新手,我正在尝试学习它并尝试从 data.frame 转移到 data.table。
现在,我正在尝试将文本拆分为新的列,并且我正在关注讨论 here。
这就是我想要做的。
这是一个示例数据:
# sample data frame
test <- data.table(POS = c(254, 280, 303, 22, 105, 173, 230, 235, 257, 258),
value = c("0/1:15:3:123:12:478:-38.8484,0,-6.94934",
"0/0:15:15:577:0:0:0,-4.51545,-52.25",
"0/0:13:13:276:0:0:0,-3.91339,-25.0455",
"0/0:367:347:13643:0:0:0,-104.457,-1226.73",
"0/0:367:344:13145:5,0,1,0:168,0,41,0:0,-89.9158,-1166.99,-103.554,-1168.49,-1182.1,-100.161,-1165.11,-1178.71,-1178.41,-103.554,-1168.49,-1182.1,-1178.71,-1182.1",
"0/1:344:180:5411:156:4394:-294.227,0,-385.695",
"0/0:352:349:12289:1:12:0,-104.28,-1104.15",
"0/0:352:345:10691:1:12:0,-103.081,-960.583",
"0/0:352:351:13162:1:41:0,-101.868,-1179.6",
"0/0:352:349:12593:0:0:0,-105.059,-1132.45"))
我想使用带有特定列名的“:”将值拆分到不同的列中。下面的代码(我从上面的 link 中学到的)完美地做到了这一点。
test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":",
fixed=TRUE)]
但是,是否可以使用 R 对象代替上面的 c(names)?像这样:
# new column names
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")
然后,使用如下所示的 namesForm:
# use the namesForm as column names
test[, namesForm := tstrsplit(value, ":", fixed=TRUE)]
这给了我警告和不同的输出(给了我一个 data.table 的 3 个变量;最后一个是 10 个列表,从 tstrsplit 输出中回收了 7 个列表)
Warning message:
In `[.data.table`(test, , `:=`(namesForm, tstrsplit(value, ":", :
Supplied 7 items to be assigned to 10 items of column 'namesForm' (recycled leaving remainder of 3 items).
所以我的问题又是,是否可以使用 R object/variable 代替显式 c()?
您可以使用 (namesForm) :=
而不是 namesForm :=
。
示例:
test2 <- copy(test)
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")
str(test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame': 10 obs. of 9 variables:
# $ POS : num 254 280 303 22 105 173 230 235 257 258
# $ value: chr "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
# $ GT : chr "0/1" "0/0" "0/0" "0/0" ...
# $ DP : chr "15" "15" "13" "367" ...
# $ RO : chr "3" "15" "13" "347" ...
# $ QR : chr "123" "577" "276" "13643" ...
# $ AO : chr "12" "0" "0" "0" ...
# $ QA : chr "478" "0" "0" "0" ...
# $ GL : chr "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
# - attr(*, ".internal.selfref")=<externalptr>
str(test2[, (namesForm) := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame': 10 obs. of 9 variables:
# $ POS : num 254 280 303 22 105 173 230 235 257 258
# $ value: chr "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
# $ GT : chr "0/1" "0/0" "0/0" "0/0" ...
# $ DP : chr "15" "15" "13" "367" ...
# $ RO : chr "3" "15" "13" "347" ...
# $ QR : chr "123" "577" "276" "13643" ...
# $ AO : chr "12" "0" "0" "0" ...
# $ QA : chr "478" "0" "0" "0" ...
# $ GL : chr "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
# - attr(*, ".internal.selfref")=<externalptr>
我是 data.table 的新手,我正在尝试学习它并尝试从 data.frame 转移到 data.table。
现在,我正在尝试将文本拆分为新的列,并且我正在关注讨论 here。
这就是我想要做的。
这是一个示例数据:
# sample data frame
test <- data.table(POS = c(254, 280, 303, 22, 105, 173, 230, 235, 257, 258),
value = c("0/1:15:3:123:12:478:-38.8484,0,-6.94934",
"0/0:15:15:577:0:0:0,-4.51545,-52.25",
"0/0:13:13:276:0:0:0,-3.91339,-25.0455",
"0/0:367:347:13643:0:0:0,-104.457,-1226.73",
"0/0:367:344:13145:5,0,1,0:168,0,41,0:0,-89.9158,-1166.99,-103.554,-1168.49,-1182.1,-100.161,-1165.11,-1178.71,-1178.41,-103.554,-1168.49,-1182.1,-1178.71,-1182.1",
"0/1:344:180:5411:156:4394:-294.227,0,-385.695",
"0/0:352:349:12289:1:12:0,-104.28,-1104.15",
"0/0:352:345:10691:1:12:0,-103.081,-960.583",
"0/0:352:351:13162:1:41:0,-101.868,-1179.6",
"0/0:352:349:12593:0:0:0,-105.059,-1132.45"))
我想使用带有特定列名的“:”将值拆分到不同的列中。下面的代码(我从上面的 link 中学到的)完美地做到了这一点。
test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":",
fixed=TRUE)]
但是,是否可以使用 R 对象代替上面的 c(names)?像这样:
# new column names
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")
然后,使用如下所示的 namesForm:
# use the namesForm as column names
test[, namesForm := tstrsplit(value, ":", fixed=TRUE)]
这给了我警告和不同的输出(给了我一个 data.table 的 3 个变量;最后一个是 10 个列表,从 tstrsplit 输出中回收了 7 个列表)
Warning message:
In `[.data.table`(test, , `:=`(namesForm, tstrsplit(value, ":", :
Supplied 7 items to be assigned to 10 items of column 'namesForm' (recycled leaving remainder of 3 items).
所以我的问题又是,是否可以使用 R object/variable 代替显式 c()?
您可以使用 (namesForm) :=
而不是 namesForm :=
。
示例:
test2 <- copy(test)
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")
str(test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame': 10 obs. of 9 variables:
# $ POS : num 254 280 303 22 105 173 230 235 257 258
# $ value: chr "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
# $ GT : chr "0/1" "0/0" "0/0" "0/0" ...
# $ DP : chr "15" "15" "13" "367" ...
# $ RO : chr "3" "15" "13" "347" ...
# $ QR : chr "123" "577" "276" "13643" ...
# $ AO : chr "12" "0" "0" "0" ...
# $ QA : chr "478" "0" "0" "0" ...
# $ GL : chr "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
# - attr(*, ".internal.selfref")=<externalptr>
str(test2[, (namesForm) := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame': 10 obs. of 9 variables:
# $ POS : num 254 280 303 22 105 173 230 235 257 258
# $ value: chr "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
# $ GT : chr "0/1" "0/0" "0/0" "0/0" ...
# $ DP : chr "15" "15" "13" "367" ...
# $ RO : chr "3" "15" "13" "347" ...
# $ QR : chr "123" "577" "276" "13643" ...
# $ AO : chr "12" "0" "0" "0" ...
# $ QA : chr "478" "0" "0" "0" ...
# $ GL : chr "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
# - attr(*, ".internal.selfref")=<externalptr>