R中特殊字符的条件替换

Question

我希望能够获取一个数据框 df，其中包含一列 df$col，其中的条目如下：

I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired

并将出现在字母之间的问号替换为撇号，将出现在字符串开头的问号替换为空：

I'm tired
You're tired
You're tired?
Are you tired?
I am tired

Answer 1

我们可以使用sub

df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?" "I am tired"

这里我们假设会有一个 ?，如示例所示。否则，只需将内部 sub 替换为 gsub

数据

df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?", 
"Are you tired?", "?I am tired")), .Names = "col", 
 class = "data.frame", row.names = c(NA, -5L))

Answer 2

我会对开头的问号使用 sub，对其他问号使用 gsub，因为字符串中的单词之间可能有多个问号，但开头只有一个问号。

gsub("(\w)\?(\w)", "\1'\2", sub("^\?", "", df$col))
[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?"
[5] "I am tired"

有关解释，请参阅 https://regex101.com/r/jClVPg/1。

一些解释：

第一个捕获组 (\\w):

\\w 匹配任何单词字符（等于 [a-zA-Z0-9_]）
\\?匹配字符 ?字面意思（区分大小写）
第二个捕获组 (\\w):

\\w 匹配任何单词字符（等于 [a-zA-Z0-9_]）

R中特殊字符的条件替换

conditional replacement of special characters in R

regex

r

gsub

数据