R中特殊字符的条件替换
conditional replacement of special characters in R
我希望能够获取一个数据框 df,其中包含一列 df$col,其中的条目如下:
I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired
并将出现在字母之间的问号替换为撇号,将出现在字符串开头的问号替换为空:
I'm tired
You're tired
You're tired?
Are you tired?
I am tired
我们可以使用sub
df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?" "I am tired"
这里我们假设会有一个 ?
,如示例所示。否则,只需将内部 sub
替换为 gsub
数据
df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?",
"Are you tired?", "?I am tired")), .Names = "col",
class = "data.frame", row.names = c(NA, -5L))
我会对开头的问号使用 sub
,对其他问号使用 gsub
,因为字符串中的单词之间可能有多个问号,但开头只有一个问号。
gsub("(\w)\?(\w)", "\1'\2", sub("^\?", "", df$col))
[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?"
[5] "I am tired"
有关解释,请参阅 https://regex101.com/r/jClVPg/1。
一些解释:
第一个捕获组 (\\w):
\\w 匹配任何单词字符(等于 [a-zA-Z0-9_])
\\?匹配字符 ?字面意思(区分大小写)
第二个捕获组 (\\w):
\\w 匹配任何单词字符(等于 [a-zA-Z0-9_])
我希望能够获取一个数据框 df,其中包含一列 df$col,其中的条目如下:
I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired
并将出现在字母之间的问号替换为撇号,将出现在字符串开头的问号替换为空:
I'm tired
You're tired
You're tired?
Are you tired?
I am tired
我们可以使用sub
df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?" "I am tired"
这里我们假设会有一个 ?
,如示例所示。否则,只需将内部 sub
替换为 gsub
数据
df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?",
"Are you tired?", "?I am tired")), .Names = "col",
class = "data.frame", row.names = c(NA, -5L))
我会对开头的问号使用 sub
,对其他问号使用 gsub
,因为字符串中的单词之间可能有多个问号,但开头只有一个问号。
gsub("(\w)\?(\w)", "\1'\2", sub("^\?", "", df$col))
[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?"
[5] "I am tired"
有关解释,请参阅 https://regex101.com/r/jClVPg/1。
一些解释:
第一个捕获组 (\\w):
\\w 匹配任何单词字符(等于 [a-zA-Z0-9_])
\\?匹配字符 ?字面意思(区分大小写)
第二个捕获组 (\\w):
\\w 匹配任何单词字符(等于 [a-zA-Z0-9_])