当术语位于多个位置时无法检测到字符串

Unable to detect string when term is located at multiple locations

我有一个数据集,我想在其中检测前面没有特定术语的术语。问题是该术语可能会在现场多次出现。这两种方法我都试过了。但在每个方面,我都没有过度检测或检测不足。

library(stringr)
x <- "In the first round the car had no flat tyre. In the secound round there was a flat"
str_detect(x,"((?i)(?<!no )flat) & ((?i)(?<!not )flat)")

这个 returns 一个 FLASE 但我们希望它是 TRUE。更改为 | 将检测该词,即使它不应该是:

x <- "In the first round the car had no flat tyre."
str_detect(x,"((?i)(?<!no )flat) | ((?i)(?<!not )flat)")

这个returnsTRUE而我们想要FALSE.

如何确保我能够正确检测到字词。

编辑 1: 我想检测术语 flat,它不在以下任一术语之前:nonototherterm.

str_detect,就像grepl一样,只是简单地测试字符串中是否包含一个模式。因此,如果您有(一个)字符串“no flat tire a flat tyre”,但想测试该字符串是否包含 not 前面有“flat”的“tyre”实例,然后 运行 str_detectgrepl 将没有用。对您的目的更有用,如果该目的是检查是否有“flat”not 的实例,例如,“flat”将使用 str_extract,像这样:

str_extract(x,"(?i)(?<!not? )flat")
[1] "flat"

这断言子字符串“flat”只有在其前面没有“no”或“not”后跟空格时才会被提取。

该操作做了 return 一个 flat,表明该字符串确实包含您实际要查找的模式。

数据:

x <- "In the first round the car had no flat tyre. In the secound round there was a flat"

你可以结合两个look behind来测试在flat[=20之前是否既没有no也没有not =].

x <- "In the first round the car had no flat tyre. In the secound round there was a flat"
grepl("(?i)(?<!no )(?<!not )flat", x, perl=TRUE)
#[1] TRUE

x <- "In the first round the car had no flat tyre."
grepl("(?i)(?<!no )(?<!not )flat", x, perl=TRUE)
#[1] FALSE

或使用stringr::str_detect:

stringr::str_detect(x, "(?i)(?<!no )(?<!not )flat")