在数据 table 上按最后一个 space 拆分字符串
split string by last space on data table
我有一个包含 2 列的数据 table:
term freq
1: a arena tour 1
2: a available why 1
3: a backup in 1
4: a bad ass 1
5: a bad chick 1
我想用最后一个 space 拆分 "term" 列,例如:
termA termB freq
1: a arena tour 1
2: a available why 1
3: a backup in 1
4: a bad chick 1
我尝试使用 "str"(下面的代码),它仅适用于字符串但不适用于 data.date(似乎在所有行上使用相同的索引)
data.table (termA = substr(dt_n3$term, 1, rev(gregexpr("\ ", dt_n3$term)[[1]])[1]-1),
termB = substr(dt_n3$term, rev(gregexpr("\ ", dt_n3$term)[[1]])[1], 1000),
freq = dt_n3$freq)
无论如何,我认为这不是最好的方法。
谁能帮帮我吗?
谢谢
使用sub
可以分两步实现。
dt = data.table(term = c("a arena tour","a available why","a bad ass"), freq=1)
# erase last part
dt[, termA := sub(" [^ ]*$", "", term)]
# erase first part
dt[, termB := sub(".* ", "", term)]
您可以尝试 data.table
v 1.9.5
中的 tstrsplit
函数
DT[, paste0('term', LETTERS[1:2]) := tstrsplit(term, ' (?=[^ ]*$)',
perl=TRUE)][, term:=NULL][]
# freq termA termB
#1: 1 a arena tour
#2: 1 a available why
#3: 1 a backup in
#4: 1 a bad ass
#5: 1 a bad chick
数据
DT <- data.table(term= c("a arena tour", "a available why",
"a backup in", "a bad ass", "a bad chick"), freq=1)
稍作修改的版本,您可以在同一语句中进行赋值和删除:
cols = c("term", paste0("term", LETTERS[1:2]))
DT[, (cols) := c(list(NULL), tstrsplit(term, ' (?=[^ ]*$)', perl=TRUE))]
将 NULL
分配给 term
会删除该列。
使用stringi
包:
x <- c("ala ma kota", "this is text")
stri_locate_last_fixed(x, " ")
## start end
## [1,] 7 7
## [2,] 8 8
splitHere <- stri_locate_last_fixed(x, " ")
stri_sub(x, to= splitHere[,1]-1)
## [1] "ala ma" "this is"
stri_sub(x, from= splitHere[,1]+1)
## [1] "kota" "text"
cbind(stri_sub(x, to= splitHere[,1]-1), stri_sub(x, from=splitHere[,1]+1))
## [,1] [,2]
## [1,] "ala ma" "kota"
## [2,] "this is" "text"
我有一个包含 2 列的数据 table:
term freq
1: a arena tour 1
2: a available why 1
3: a backup in 1
4: a bad ass 1
5: a bad chick 1
我想用最后一个 space 拆分 "term" 列,例如:
termA termB freq
1: a arena tour 1
2: a available why 1
3: a backup in 1
4: a bad chick 1
我尝试使用 "str"(下面的代码),它仅适用于字符串但不适用于 data.date(似乎在所有行上使用相同的索引)
data.table (termA = substr(dt_n3$term, 1, rev(gregexpr("\ ", dt_n3$term)[[1]])[1]-1),
termB = substr(dt_n3$term, rev(gregexpr("\ ", dt_n3$term)[[1]])[1], 1000),
freq = dt_n3$freq)
无论如何,我认为这不是最好的方法。 谁能帮帮我吗? 谢谢
使用sub
可以分两步实现。
dt = data.table(term = c("a arena tour","a available why","a bad ass"), freq=1)
# erase last part
dt[, termA := sub(" [^ ]*$", "", term)]
# erase first part
dt[, termB := sub(".* ", "", term)]
您可以尝试 data.table
v 1.9.5
tstrsplit
函数
DT[, paste0('term', LETTERS[1:2]) := tstrsplit(term, ' (?=[^ ]*$)',
perl=TRUE)][, term:=NULL][]
# freq termA termB
#1: 1 a arena tour
#2: 1 a available why
#3: 1 a backup in
#4: 1 a bad ass
#5: 1 a bad chick
数据
DT <- data.table(term= c("a arena tour", "a available why",
"a backup in", "a bad ass", "a bad chick"), freq=1)
稍作修改的版本,您可以在同一语句中进行赋值和删除:
cols = c("term", paste0("term", LETTERS[1:2]))
DT[, (cols) := c(list(NULL), tstrsplit(term, ' (?=[^ ]*$)', perl=TRUE))]
将 NULL
分配给 term
会删除该列。
使用stringi
包:
x <- c("ala ma kota", "this is text")
stri_locate_last_fixed(x, " ")
## start end
## [1,] 7 7
## [2,] 8 8
splitHere <- stri_locate_last_fixed(x, " ")
stri_sub(x, to= splitHere[,1]-1)
## [1] "ala ma" "this is"
stri_sub(x, from= splitHere[,1]+1)
## [1] "kota" "text"
cbind(stri_sub(x, to= splitHere[,1]-1), stri_sub(x, from=splitHere[,1]+1))
## [,1] [,2]
## [1,] "ala ma" "kota"
## [2,] "this is" "text"