R中特殊字符的条件替换

conditional replacement of special characters in R

我希望能够获取一个数据框 df,其中包含一列 df$col,其中的条目如下:

I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired

并将出现在字母之间的问号替换为撇号,将出现在字符串开头的问号替换为空:

I'm tired
You're tired
You're tired?
Are you tired?
I am tired

我们可以使用sub

df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?" "I am tired"    

这里我们假设会有一个 ?,如示例所示。否则,只需将内部 sub 替换为 gsub

数据

df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?", 
"Are you tired?", "?I am tired")), .Names = "col", 
 class = "data.frame", row.names = c(NA, -5L))

我会对开头的问号使用 sub,对其他问号使用 gsub,因为字符串中的单词之间可能有多个问号,但开头只有一个问号。

gsub("(\w)\?(\w)", "\1'\2", sub("^\?", "", df$col))
[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?"
[5] "I am tired"   

有关解释,请参阅 https://regex101.com/r/jClVPg/1

一些解释:

  • 第一个捕获组 (\\w):

    \\w 匹配任何单词字符(等于 [a-zA-Z0-9_])

  • \\?匹配字符 ?字面意思(区分大小写)

  • 第二个捕获组 (\\w):

    \\w 匹配任何单词字符(等于 [a-zA-Z0-9_])