R:删除字符串中的部分单词
R: removing part of the word in a character string
我有一个字符向量
words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
我正在尝试从向量中的每个单词中删除 span
AND 标点符号
> something thank great to hear your
问题是,如果 span
出现在我感兴趣的单词之前或之后,则没有规定。另外,span
可以粘贴到:i) 仅字符(例如 yourspan
),仅标点符号(例如 ..span?
)或字符和标点符号(例如 somethingspan.
)。
我在 SO 中搜索了答案,但通常我会看到删除整个单词的请求(例如 ) or elements of the string after/before a letter/punctuation (like here )
任何帮助将不胜感激
使用sub
删除跨度。要将其变成一个句子,请使用 paste
和 collapse
library(magrittr)
sub("^[[:punct:]]{,2}span|span[[:punct:]]{,2}$", "", words) %>% paste(collapse=" ")
所以它只删除开头或结尾的跨度。
输出
[1] "something ? thank great to hear your"
https://regex101.com/在这里你可以尝试一切。
clean_words<- gsub(pattern = "span",replacement = "",words, perl = T)
# if you want the sentence
sentence<-paste(clean_words, sep = " ", collapse = " ")
# to remove punctuation this regex only takes from A to z
clean_sentence<- gsub(pattern = "[^a-zA-Z ]",replacement = "",sentence, perl = T)
您可以使用
[[:punct:]]*span[[:punct:]]*
参见regex demo。
详情
[[:punct:]]*
- 0+ 个标点字符
span
- 文字子串
[[:punct:]]*
- 0+ 个标点字符
words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
words <- gsub("[[:punct:]]*span[[:punct:]]*", "", words) # Remove spans
words <- words[words != ""] # Discard empty elements
paste(words, collapse=" ") # Concat the elements
## => [1] "something thank great to hear your"
如果在删除不需要的字符串后结果只有空白元素,您可以将第二步替换为 words <- words[trimws(words) != ""]
(而不是 words[words != ""]
)。
我有一个字符向量
words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
我正在尝试从向量中的每个单词中删除 span
AND 标点符号
> something thank great to hear your
问题是,如果 span
出现在我感兴趣的单词之前或之后,则没有规定。另外,span
可以粘贴到:i) 仅字符(例如 yourspan
),仅标点符号(例如 ..span?
)或字符和标点符号(例如 somethingspan.
)。
我在 SO 中搜索了答案,但通常我会看到删除整个单词的请求(例如
任何帮助将不胜感激
使用sub
删除跨度。要将其变成一个句子,请使用 paste
和 collapse
library(magrittr)
sub("^[[:punct:]]{,2}span|span[[:punct:]]{,2}$", "", words) %>% paste(collapse=" ")
所以它只删除开头或结尾的跨度。
输出
[1] "something ? thank great to hear your"
https://regex101.com/在这里你可以尝试一切。
clean_words<- gsub(pattern = "span",replacement = "",words, perl = T)
# if you want the sentence
sentence<-paste(clean_words, sep = " ", collapse = " ")
# to remove punctuation this regex only takes from A to z
clean_sentence<- gsub(pattern = "[^a-zA-Z ]",replacement = "",sentence, perl = T)
您可以使用
[[:punct:]]*span[[:punct:]]*
参见regex demo。
详情
[[:punct:]]*
- 0+ 个标点字符span
- 文字子串[[:punct:]]*
- 0+ 个标点字符
words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
words <- gsub("[[:punct:]]*span[[:punct:]]*", "", words) # Remove spans
words <- words[words != ""] # Discard empty elements
paste(words, collapse=" ") # Concat the elements
## => [1] "something thank great to hear your"
如果在删除不需要的字符串后结果只有空白元素,您可以将第二步替换为 words <- words[trimws(words) != ""]
(而不是 words[words != ""]
)。