匹配字符串正则表达式完全匹配 - 特殊字符
matching strings regex exact match - special characters
从这里的已解决线程开始:matching strings regex exact match(感谢@Onyambu 提供更新的代码)。
我需要精确匹配字符串 - 即使有特殊字符。
注意 - 抱歉,这是关于此问题的第三个问题。我快到了,但现在我不知道如何处理特殊字符,而且我仍在提高在 r 中操作字符串的技能。
为清晰起见更新:
我有 table 个匹配词/字符串,如下所示:
codes <- structure(
list(
column1 = structure(
c(2L, 3L, NA),
.Label = c("",
"4+", "4 +"),
class = "factor"
),
column2 = structure(
c(1L,
3L, 2L),
.Label = c("old", "the money", "work"),
class = "factor"
),
column3 = structure(
c(3L, 2L, NA),
.Label = c("", "wonderyears",
"woke"),
class = "factor"
)
),
row.names = c(NA,-3L),
class = "data.frame"
)
还有一个包含一列字符串的数据集。
我想看看字符串中的每条记录中是否包含任何代码:
strings<- structure(
list(
SurveyID = structure(
1:4,
.Label = c("ID_1", "ID_2",
"ID_3", "ID_4"),
class = "factor"
),
Open_comments = structure(
c(2L,
4L, 3L, 1L),
.Label = c(
"I need to pick up some apples",
"The system works",
"Flag only if there is a 4 with a plus",
"Show me the money"
),
class = "factor"
)
),
class = "data.frame",
row.names = c(NA,-4L)
)
我目前正在使用以下代码将代码与字符串匹配:
strings[names(codes)] <- lapply(codes, function(x)
+(grepl(paste0("\b", na.omit(x), "\b", collapse = "|"), strings$Open_comments)))
输出:
SurveyID Open_comments column1 column2 column3
1 ID_1 The system works 0 0 0
2 ID_2 Show me the money 0 1 0
3 ID_3 Flag only if there is a 4 with a plus 1 0 0
4 ID_4 I need to pick up some apples 0 0 0
问题 - 第 3 行 ID_3
如果字符串包含“4+”或“4 +”,我只想标记它,但无论如何它都会被标记。
反正有没有准确捕捉到的?
我们可以转义 +
字面意思
+(grepl(paste0( "(", gsub("\+", "\\+", na.omit(codes$column1)), ")",
collapse="|"), strings$Open_comments))
#[1] 0 0 0 0
如果我们使用带有 4+
的字符串,它会选择
+(grepl(paste0( "(", gsub("\+", "\\+", na.omit(codes$column1)), ")",
collapse="|"), "Flag only if there is a 4+ with a plus"))
#[1] 1
对于多列
sapply(codes, function(x)+(grepl(paste0( "\b(",
gsub("\+", "\\+", na.omit(x)), ")\b",
collapse="|"), strings$Open_comments)))
# column1 column2 column3
#[1,] 0 0 0
#[2,] 0 1 0
#[3,] 0 0 0
#[4,] 0 0 0
从这里的已解决线程开始:matching strings regex exact match(感谢@Onyambu 提供更新的代码)。
我需要精确匹配字符串 - 即使有特殊字符。
注意 - 抱歉,这是关于此问题的第三个问题。我快到了,但现在我不知道如何处理特殊字符,而且我仍在提高在 r 中操作字符串的技能。
为清晰起见更新:
我有 table 个匹配词/字符串,如下所示:
codes <- structure(
list(
column1 = structure(
c(2L, 3L, NA),
.Label = c("",
"4+", "4 +"),
class = "factor"
),
column2 = structure(
c(1L,
3L, 2L),
.Label = c("old", "the money", "work"),
class = "factor"
),
column3 = structure(
c(3L, 2L, NA),
.Label = c("", "wonderyears",
"woke"),
class = "factor"
)
),
row.names = c(NA,-3L),
class = "data.frame"
)
还有一个包含一列字符串的数据集。 我想看看字符串中的每条记录中是否包含任何代码:
strings<- structure(
list(
SurveyID = structure(
1:4,
.Label = c("ID_1", "ID_2",
"ID_3", "ID_4"),
class = "factor"
),
Open_comments = structure(
c(2L,
4L, 3L, 1L),
.Label = c(
"I need to pick up some apples",
"The system works",
"Flag only if there is a 4 with a plus",
"Show me the money"
),
class = "factor"
)
),
class = "data.frame",
row.names = c(NA,-4L)
)
我目前正在使用以下代码将代码与字符串匹配:
strings[names(codes)] <- lapply(codes, function(x)
+(grepl(paste0("\b", na.omit(x), "\b", collapse = "|"), strings$Open_comments)))
输出:
SurveyID Open_comments column1 column2 column3
1 ID_1 The system works 0 0 0
2 ID_2 Show me the money 0 1 0
3 ID_3 Flag only if there is a 4 with a plus 1 0 0
4 ID_4 I need to pick up some apples 0 0 0
问题 - 第 3 行 ID_3 如果字符串包含“4+”或“4 +”,我只想标记它,但无论如何它都会被标记。 反正有没有准确捕捉到的?
我们可以转义 +
字面意思
+(grepl(paste0( "(", gsub("\+", "\\+", na.omit(codes$column1)), ")",
collapse="|"), strings$Open_comments))
#[1] 0 0 0 0
如果我们使用带有 4+
的字符串,它会选择
+(grepl(paste0( "(", gsub("\+", "\\+", na.omit(codes$column1)), ")",
collapse="|"), "Flag only if there is a 4+ with a plus"))
#[1] 1
对于多列
sapply(codes, function(x)+(grepl(paste0( "\b(",
gsub("\+", "\\+", na.omit(x)), ")\b",
collapse="|"), strings$Open_comments)))
# column1 column2 column3
#[1,] 0 0 0
#[2,] 0 1 0
#[3,] 0 0 0
#[4,] 0 0 0