R:删除字符串中的部分单词

R: removing part of the word in a character string

我有一个字符向量

words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")

我正在尝试从向量中的每个单词中删除 span AND 标点符号

> something thank great to hear your

问题是,如果 span 出现在我感兴趣的单词之前或之后,则没有规定。另外,span 可以粘贴到:i) 仅字符(例如 yourspan),仅标点符号(例如 ..span?)或字符和标点符号(例如 somethingspan.)。

我在 SO 中搜索了答案,但通常我会看到删除整个单词的请求(例如 ) or elements of the string after/before a letter/punctuation (like here

任何帮助将不胜感激

使用sub删除跨度。要将其变成一个句子,请使用 pastecollapse

library(magrittr)

sub("^[[:punct:]]{,2}span|span[[:punct:]]{,2}$", "", words)  %>% paste(collapse=" ")

所以它只删除开头或结尾的跨度。

输出

[1] "something ? thank great to hear your"

https://regex101.com/在这里你可以尝试一切。

clean_words<- gsub(pattern = "span",replacement = "",words, perl = T)
# if you want the sentence
sentence<-paste(clean_words, sep = " ", collapse = " ")

# to remove punctuation this regex only takes from A to z
clean_sentence<- gsub(pattern = "[^a-zA-Z ]",replacement = "",sentence, perl = T)

您可以使用

[[:punct:]]*span[[:punct:]]*

参见regex demo

详情

  • [[:punct:]]* - 0+ 个标点字符
  • span - 文字子串
  • [[:punct:]]* - 0+ 个标点字符

R Demo:

words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
words <- gsub("[[:punct:]]*span[[:punct:]]*", "", words) # Remove spans
words <- words[words != ""] # Discard empty elements
paste(words, collapse=" ")  # Concat the elements
## => [1] "something thank great to hear your"

如果在删除不需要的字符串后结果只有空白元素,您可以将第二步替换为 words <- words[trimws(words) != ""](而不是 words[words != ""])。