R 删除 data.table 中的多个文本字符串
R remove multiple text strings in data.table
我有 vector
个单词要从 data.table
DT
中删除,如下所示。
wordstoremove <- c("Simpson", "Flander", "Nahasapeemapetilon", "Spuckler", "Wiggum")
DT <- structure(list(vid = c("Simpsons", "Flanders", "Nahasapeemapetilons",
"Spucklers", "Wiggums"), wr1 = c("Homer Simpson", "Ned Flanders",
"Apu Nahasapeemapetilon", "Cletus Spuckler", "Chief Wiggum"),
wr2 = c("Bart Simpson", "Rod Flanders", "Manjula Nahasapeemapetilon",
"Brandine Spuckler", "Ralph Wiggum"), wr3 = c("Marge Simpson",
"Todd Flanders", "Sanjay Nahasapeemapetilon", NA, "Sarah Wiggum"
)), .Names = c("vid", "wr1", "wr2", "wr3"), row.names = c(NA,
-5L), class = c("data.table", "data.frame"))
DT
vid wr2 wr2 wr3
1: Simpsons Homer Simpson Bart Simpson Marge Simpson
2: Flanders Ned Flanders Rod Flanders Todd Flanders
3: Nahasapeemapetilons Apu Nahasapeemapetilon Manjula Nahasapeemapetilon Sanjay Nahasapeemapetilon
4: Spucklers Cletus Spuckler Brandine Spuckler NA
5: Wiggums Chief Wiggum Ralph Wiggum Sarah Wiggum
我知道我可以使用 R remove multiple text strings in data frame 中的解决方案。
如何使用 data.table
来减少数据复制?
试试这个:
library(data.table)
foo <- function(x) gsub(paste0(wordstoremove, collapse="s?|"), "", x)
DT[, names(DT)[-1] := lapply(.SD, foo), .SDcols = names(DT)[-1]]
DT
# vid wr1 wr2 wr3
# 1: Simpsons Homer Bart Marge
# 2: Flanders Ned Rod Todd
# 3: Nahasapeemapetilons Apu Manjula Sanjay
# 4: Spucklers Cletus Brandine NA
# 5: Wiggums Chief Ralph Sarah
我有 vector
个单词要从 data.table
DT
中删除,如下所示。
wordstoremove <- c("Simpson", "Flander", "Nahasapeemapetilon", "Spuckler", "Wiggum")
DT <- structure(list(vid = c("Simpsons", "Flanders", "Nahasapeemapetilons",
"Spucklers", "Wiggums"), wr1 = c("Homer Simpson", "Ned Flanders",
"Apu Nahasapeemapetilon", "Cletus Spuckler", "Chief Wiggum"),
wr2 = c("Bart Simpson", "Rod Flanders", "Manjula Nahasapeemapetilon",
"Brandine Spuckler", "Ralph Wiggum"), wr3 = c("Marge Simpson",
"Todd Flanders", "Sanjay Nahasapeemapetilon", NA, "Sarah Wiggum"
)), .Names = c("vid", "wr1", "wr2", "wr3"), row.names = c(NA,
-5L), class = c("data.table", "data.frame"))
DT
vid wr2 wr2 wr3
1: Simpsons Homer Simpson Bart Simpson Marge Simpson
2: Flanders Ned Flanders Rod Flanders Todd Flanders
3: Nahasapeemapetilons Apu Nahasapeemapetilon Manjula Nahasapeemapetilon Sanjay Nahasapeemapetilon
4: Spucklers Cletus Spuckler Brandine Spuckler NA
5: Wiggums Chief Wiggum Ralph Wiggum Sarah Wiggum
我知道我可以使用 R remove multiple text strings in data frame 中的解决方案。
如何使用 data.table
来减少数据复制?
试试这个:
library(data.table)
foo <- function(x) gsub(paste0(wordstoremove, collapse="s?|"), "", x)
DT[, names(DT)[-1] := lapply(.SD, foo), .SDcols = names(DT)[-1]]
DT
# vid wr1 wr2 wr3
# 1: Simpsons Homer Bart Marge
# 2: Flanders Ned Rod Todd
# 3: Nahasapeemapetilons Apu Manjula Sanjay
# 4: Spucklers Cletus Brandine NA
# 5: Wiggums Chief Ralph Sarah