如何删除不包括否定的标点符号?

How to remove punctuation excluding negations?

假设我有以下句子:


s = c("I don't want to remove punctuation for negations. Instead, I want to remove only general punctuation. For example, keep I wouldn't like it but remove Inter's fan or Man city's fan.")

我希望得到以下结果:

"I don't want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn't like it but remove Inter fan or Man city fan."

目前,如果我只使用下面的代码,我会删除否定中的两个 's and '。


  s %>%  str_replace_all("['']s\b|[^[:alnum:][:blank:]@_]"," ")

 "I don t want to remove punctuation for negations  Instead  I want to remove only general punctuation           For example  keep I wouldn t like it but remove Inter  fan or Man city  fan "

总而言之,我需要一个代码来删除一般标点符号,包括“'s”,但我想保留其原始格式的否定除外。

谁能帮帮我?

谢谢!

我们可以分两步完成,remove all punctuation excluding "'",然后使用固定匹配删除 "'s"

gsub("'s", "", gsub("[^[:alnum:][:space:]']", "", s), fixed = TRUE)

您可以使用前瞻性 (?!t) 测试 [:punct:] 后面没有跟 t

gsub("[[:punct:]](?!t)\w?", "", s, perl=TRUE)
#[1] "I don't want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn't like it but remove Inter fan or Man city fan"

如果你想更严格,你可以在 (?<!n).

之前额外测试是否没有 n
gsub("(?<!n)[[:punct:]](?!t)\w?", "", s, perl=TRUE)

或者以防万一将其限制为 't(感谢@chris-ruehlemann)

gsub("(?!'t)[[:punct:]]\w?", "", s, perl=TRUE)

或删除每个 punct 但不删除 ''s:

gsub("[^'[:^punct:]]|'s", "", s, perl = TRUE)

相同但使用前瞻:

gsub("(?!')[[:punct:]]|'s", "", s, perl = TRUE)