从字符串中删除特定的短语
Remove specific phrases from a string
我正在尝试将 R 用于一些基本的文本分析。
我有一列包含复杂数据类型。我希望维护一个单独的 table,我可以用它从第一个数据列中删除某些短语。
我试过 gsubfn 但没有成功。
例如
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
为什么
x <- gsubfn(removefields,"",dirtydata)
不工作?
希望有输出
c("JOHN ","@PETER","BOB 22","RUPERT ")
试试这个。
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT | BODY CORPORATE")
x <- gsub(removefields, "", dirtydata)
这概括了您放入 removefields
的任何内容,并去除了要删除的字符串周围的空白:
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <- c("COURT","BODY CORPORATE")
removefields <- paste0("\s+", removefields, "\s+", collapse = "|")
x <- gsub(removefields, "", dirtydata)
我们可以使用tm
包
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
library(tm)
removeWords(dirtydata, removefields)
> removeWords(dirtydata, removefields)
[1] "JOHN " "@PETER" "BOB 22" "RUPERT "
请使用 R
的 base
函数找到下面编辑的代码
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
pastedFields = paste0(removefields,collapse = "|")
gsub(pastedFields,"",dirtydata)
我正在尝试将 R 用于一些基本的文本分析。
我有一列包含复杂数据类型。我希望维护一个单独的 table,我可以用它从第一个数据列中删除某些短语。
我试过 gsubfn 但没有成功。
例如
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
为什么
x <- gsubfn(removefields,"",dirtydata)
不工作?
希望有输出
c("JOHN ","@PETER","BOB 22","RUPERT ")
试试这个。
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT | BODY CORPORATE")
x <- gsub(removefields, "", dirtydata)
这概括了您放入 removefields
的任何内容,并去除了要删除的字符串周围的空白:
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <- c("COURT","BODY CORPORATE")
removefields <- paste0("\s+", removefields, "\s+", collapse = "|")
x <- gsub(removefields, "", dirtydata)
我们可以使用tm
包
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
library(tm)
removeWords(dirtydata, removefields)
> removeWords(dirtydata, removefields)
[1] "JOHN " "@PETER" "BOB 22" "RUPERT "
请使用 R
的base
函数找到下面编辑的代码
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
pastedFields = paste0(removefields,collapse = "|")
gsub(pastedFields,"",dirtydata)