R正则表达式列出不以例如“AA”或“BB”开头的文件

Question

这是我们需要在工作目录中创建的 reprex：

library(tidyverse)
library(openxlsx)
library(readxl)
write.xlsx(list(iris), "AA-excel-file.xlsx")
write.xlsx(list(iris), "BB-excel-file.xlsx")
write.xlsx(list(iris), "CC-excel-file.xlsx")
write.xlsx(list(iris), "DD-excel-file.xlsx")
write.xlsx(list(iris), "EE-excel-file.xlsx")

我的工作目录看起来像这样：

C:
├── my-R-working-directory/
    ├── AA-excel-file.xlsx
    ├── BB-excel-file.xlsx
    ├── CC-excel-file.xlsx
    ├── DD-excel-file.xlsx
    └── EE-excel-file.xlsx

我设计了一个正则表达式 (demo here)，它“选择”任何不以 AA 或 BB:

开头的文件

^(?!AA|BB)\w+$

我想将此正则表达式与基数 R list.files() 一起使用，以列出所有不以 AA 或 BB 开头的文件。这是我的尝试：

list.files("path/of/folder", pattern = "\^(?!AA|BB)\w+$.xlsx$", full.names = TRUE)
#> Error: '\w' is an unrecognized escape in character string starting ""\^(?!AA|BB)\w"
#> Error: unexpected ')' in "           full.names = TRUE)"

我认为我的模式论证有点不对劲。这个类似的命令工作正常，但不排除 AA 和 BB 文件：

list.files("path/of/folder", pattern = "\.xlsx$", full.names = TRUE)

如何正确编写 pattern 参数以排除任何以 AA 或 BB 开头的文件？如果你有能力，你能纠正我的正则表达式吗？正则表达式似乎只适用于“字母或数字”字符。任何白色 space、破折号、点等都会破坏正则表达式 (see demo).

Answer 1

您可以使用 pattern 获取所有 xlsx 文件，然后反转 grep 以 AA 或 BB:

开头的文件

library(tidyverse)
library(openxlsx)
library(readxl)

write.xlsx(list(iris), "AA-excel-file.xlsx")
write.xlsx(list(iris), "BB-excel-file.xlsx")
write.xlsx(list(iris), "CC-excel-file.xlsx")
write.xlsx(list(iris), "DD-excel-file.xlsx")
write.xlsx(list(iris), "EE-excel-file.xlsx")

grep("^(AA|BB).*", list.files(pattern = "\.xlsx$"), invert = TRUE, value = TRUE)
#> [1] "CC-excel-file.xlsx" "DD-excel-file.xlsx" "EE-excel-file.xlsx"

R正则表达式列出不以例如“AA”或“BB”开头的文件

R regex to list files that do not begin with eg `AA` or `BB`

regex

import

r

stringr

readr