如何删除不包括否定的标点符号?
How to remove punctuation excluding negations?
假设我有以下句子:
s = c("I don't want to remove punctuation for negations. Instead, I want to remove only general punctuation. For example, keep I wouldn't like it but remove Inter's fan or Man city's fan.")
我希望得到以下结果:
"I don't want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn't like it but remove Inter fan or Man city fan."
目前,如果我只使用下面的代码,我会删除否定中的两个 's and '。
s %>% str_replace_all("['']s\b|[^[:alnum:][:blank:]@_]"," ")
"I don t want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn t like it but remove Inter fan or Man city fan "
总而言之,我需要一个代码来删除一般标点符号,包括“'s”,但我想保留其原始格式的否定除外。
谁能帮帮我?
谢谢!
我们可以分两步完成,remove all punctuation excluding "'"
,然后使用固定匹配删除 "'s"
:
gsub("'s", "", gsub("[^[:alnum:][:space:]']", "", s), fixed = TRUE)
您可以使用前瞻性 (?!t)
测试 [:punct:]
后面没有跟 t
。
gsub("[[:punct:]](?!t)\w?", "", s, perl=TRUE)
#[1] "I don't want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn't like it but remove Inter fan or Man city fan"
如果你想更严格,你可以在 (?<!n)
.
之前额外测试是否没有 n
gsub("(?<!n)[[:punct:]](?!t)\w?", "", s, perl=TRUE)
或者以防万一将其限制为 't
(感谢@chris-ruehlemann)
gsub("(?!'t)[[:punct:]]\w?", "", s, perl=TRUE)
或删除每个 punct
但不删除 '
或 's
:
gsub("[^'[:^punct:]]|'s", "", s, perl = TRUE)
相同但使用前瞻:
gsub("(?!')[[:punct:]]|'s", "", s, perl = TRUE)
假设我有以下句子:
s = c("I don't want to remove punctuation for negations. Instead, I want to remove only general punctuation. For example, keep I wouldn't like it but remove Inter's fan or Man city's fan.")
我希望得到以下结果:
"I don't want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn't like it but remove Inter fan or Man city fan."
目前,如果我只使用下面的代码,我会删除否定中的两个 's and '。
s %>% str_replace_all("['']s\b|[^[:alnum:][:blank:]@_]"," ")
"I don t want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn t like it but remove Inter fan or Man city fan "
总而言之,我需要一个代码来删除一般标点符号,包括“'s”,但我想保留其原始格式的否定除外。
谁能帮帮我?
谢谢!
我们可以分两步完成,remove all punctuation excluding "'"
,然后使用固定匹配删除 "'s"
:
gsub("'s", "", gsub("[^[:alnum:][:space:]']", "", s), fixed = TRUE)
您可以使用前瞻性 (?!t)
测试 [:punct:]
后面没有跟 t
。
gsub("[[:punct:]](?!t)\w?", "", s, perl=TRUE)
#[1] "I don't want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn't like it but remove Inter fan or Man city fan"
如果你想更严格,你可以在 (?<!n)
.
n
gsub("(?<!n)[[:punct:]](?!t)\w?", "", s, perl=TRUE)
或者以防万一将其限制为 't
(感谢@chris-ruehlemann)
gsub("(?!'t)[[:punct:]]\w?", "", s, perl=TRUE)
或删除每个 punct
但不删除 '
或 's
:
gsub("[^'[:^punct:]]|'s", "", s, perl = TRUE)
相同但使用前瞻:
gsub("(?!')[[:punct:]]|'s", "", s, perl = TRUE)