在 R 中使用负后视

Question

我将这些字符串放在一个文件夹中。假设此文件夹中还有其他类似文件。

 [3] "/farm/chickens_industrial_meat_location_df.csv"   
 [4] "farm/goats_grassland_meat_location_df.csv"

我正在尝试提取字符串 location_df 的文件，同时排除字符串 chickens & location_df.

的文件

我想我可以通过输入以下内容来做到这一点：list.files(pattern = "location_df(?<!(chickens))"

我的理解是，使用否定环视会删除具有 chickens 的字符串。我对这里的正则表达式有什么不了解的？我的问题的解决方案是什么。

Answer 1

带有 grepl 的选项是

str1[!grepl('chickens_.*location_df', str1) & grepl('location_df', str1)]
#[1] "farm/goats_grassland_meat_location_df.csv"

或者更简化的版本是

str1[!grepl('chickens_', str1) & grepl('location_df', str1)]

数据

str1 <- c("/farm/chickens_industrial_meat_location_df.csv",
        "farm/goats_grassland_meat_location_df.csv" )

Answer 2

> list.files(pattern = "location_df")
[1] "chickens_industrial_meat_location_df.csv" "goats_grassland_meat_location_df.csv"    

> setdiff(list.files(pattern = "location_df"), list.files(pattern = "chickens"))
[1] "goats_grassland_meat_location_df.csv"

> setdiff(list.files(pattern = "location_df"), list.files(pattern = "goats"))
[1] "chickens_industrial_meat_location_df.csv"

根据正则表达式的 R-helpfile，"...使用正则表达式（通常通过使用 grep）的函数包括 apropos、browseEnv、help.search、list.files 和 ls。这些都将使用扩展的正则表达式。" (ERE).

阅读以上内容表明 list.files() 和 list.dirs() 函数没有实现 Perl-compatible 正则表达式 (PCRE) 通常可用的环视。一个小线索是 list.files() / list.dirs() 的 R-helpfile 不包括选项 perl=TRUE.

因此，上面显示的代码使用 setdiff() 来帮助您查询目录，而不是环顾四周。当然，使用上面的代码，您要搜索的两个正则表达式 'tokens' 可以以任何顺序出现，但您可以通过搜索“location_df.csv”或“location_df.csv$ 来帮助自己解决问题"（因为“.csv”扩展名会出现在文件名的末尾，而 $ -zerowidth 断言会类似地将模式锚定到字符串的末尾）。您也可以尝试使用 ^ 将“chickens”或“goats”锚定到字符串的开头。把它们放在一起给出下面的代码：

> setdiff(list.files(pattern = "location_df.csv$"), list.files(pattern = "^chickens"))
[1] "goats_grassland_meat_location_df.csv"

> setdiff(list.files(pattern = "location_df.csv$"), list.files(pattern = "^goats"))
[1] "chickens_industrial_meat_location_df.csv"

https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html
https://www.r-project.org/

在 R 中使用负后视

Using negative lookbehind in R

r

regex-lookarounds

数据