匹配字符串正则表达式完全匹配 - 特殊字符

matching strings regex exact match - special characters

从这里的已解决线程开始:matching strings regex exact match(感谢@Onyambu 提供更新的代码)。

我需要精确匹配字符串 - 即使有特殊字符。

注意 - 抱歉,这是关于此问题的第三个问题。我快到了,但现在我不知道如何处理特殊字符,而且我仍在提高在 r 中操作字符串的技能。

为清晰起见更新:

我有 table 个匹配词/字符串,如下所示:

codes <- structure(
  list(
    column1 = structure(
      c(2L, 3L, NA),
      .Label = c("",
                 "4+", "4 +"),
      class = "factor"
    ),
    column2 = structure(
      c(1L,
        3L, 2L),
      .Label = c("old", "the money", "work"),
      class = "factor"
    ),
    column3 = structure(
      c(3L, 2L, NA),
      .Label = c("", "wonderyears",
                 "woke"),
      class = "factor"
    )
  ),
  row.names = c(NA,-3L),
  class = "data.frame"
)

还有一个包含一列字符串的数据集。 我想看看字符串中的每条记录中是否包含任何代码:

strings<- structure(
  list(
    SurveyID = structure(
      1:4,
      .Label = c("ID_1", "ID_2",
                 "ID_3", "ID_4"),
      class = "factor"
    ),
    Open_comments = structure(
      c(2L,
        4L, 3L, 1L),
      .Label = c(
        "I need to pick up some apples",
        "The system works",
        "Flag only if there is a 4 with a plus",
        "Show me the money"
      ),
      class = "factor"
    )
  ),
  class = "data.frame",
  row.names = c(NA,-4L)
)

我目前正在使用以下代码将代码与字符串匹配:

strings[names(codes)] <- lapply(codes, function(x) 
  +(grepl(paste0("\b", na.omit(x), "\b", collapse = "|"), strings$Open_comments)))

输出:

  SurveyID                         Open_comments column1 column2 column3
1     ID_1                      The system works       0       0       0
2     ID_2                     Show me the money       0       1       0
3     ID_3 Flag only if there is a 4 with a plus       1       0       0
4     ID_4         I need to pick up some apples       0       0       0

问题 - 第 3 行 ID_3 如果字符串包含“4+”或“4 +”,我只想标记它,但无论如何它都会被标记。 反正有没有准确捕捉到的?

我们可以转义 + 字面意思

+(grepl(paste0( "(", gsub("\+", "\\+", na.omit(codes$column1)), ")",
     collapse="|"), strings$Open_comments))
#[1] 0 0 0 0

如果我们使用带有 4+ 的字符串,它会选择

+(grepl(paste0( "(", gsub("\+", "\\+", na.omit(codes$column1)), ")",
     collapse="|"), "Flag only if there is a 4+ with a plus"))
#[1] 1

对于多列

sapply(codes, function(x)+(grepl(paste0( "\b(", 
      gsub("\+", "\\+", na.omit(x)), ")\b",
      collapse="|"), strings$Open_comments)))
#     column1 column2 column3
#[1,]       0       0       0
#[2,]       0       1       0
#[3,]       0       0       0
#[4,]       0       0       0