Java 正则表达式和 PathMatcher

Question

我正在 Java 中编写一个应用程序，它显示一个文件列表，其中文件名中的第一个单词与用户定义的字符串匹配，然后根据某些偏好删除或重新排列它们。我目前正处于寻找找到文件的好方法的阶段。使用 this Java Tutorial 我得到了这样的结果：

Path source = Paths.get(sourceText.getText());
Path dest = Paths.get(destText.getText());

System.out.println("Source:" + source.toString());
System.out.println("P/N: " + partNoText.getText());

String matchString = "glob:**" + partNoText.getText() + "*";

System.out.println("Matching: " + matchString);

fileFinder = new FileFinder(matchString);

try {
    Files.walkFileTree(source, fileFinder);
} catch (IOException e1) {
    e1.printStackTrace();
}
for (Path path : fileFinder.getResult()) {
    System.out.println("Moving: " + path.getFileName());
    Path target = Paths.get(dest.toString() + "\" + path.getFileName());

    try {
        Files.move(path, target, REPLACE_EXISTING);
    } catch (IOException e1) {
        e1.printStackTrace();
    }
}

其中 FileFinder 扩展了 SimpleFileVisitor 并具有此 visitFile 方法：

public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
    System.out.println(file.toString());
    System.out.println(fileMatcher.matches(file));
    if (fileMatcher.matches(file)) {
        result.add(file);
        return FileVisitResult.CONTINUE;
    }
    return FileVisitResult.CONTINUE;
}

我的问题是 glob 会拾取文件名包含部件号的任何文件。以任何方式。因此，如果我的文件名为“12345 RevA Really Big Part 2: Electric Bugaloo”，那么如果用户输入“1”或“123”或 "Bugaloo"，该字符串将匹配。理想情况下，它只会在用户输入“12345”时匹配。

我尝试将我的 matchString 切换为 "regex: .*" + partNoText + "\b"，它适用于我从 this other Java Tutorial 修改的正则表达式测试工具。我究竟做错了什么？ PathMatcher 与常规 Matcher 的工作方式不同吗？

P.S。任何包含单词 "Text" 的变量，如 sourceText 和 partNoText 都是 JTextFields。 Hopefull 这是代码中唯一从我剪下的内容中大部分不清楚的部分。

Answer 1

"Does PathMatcher work differently than a regular Matcher?"
是的。 PathMatcher 使用文件名 globbing^[1]，而 Matcher 使用 正则表达式.

见What Is a Glob? in the tutorial you linked, and compare that with the documentation for java.util.regex.Pattern。
Globbing 比正则表达式匹配更受限制。

如果您有严格遵守的严格文件命名约定，您可能可以使用 globbing（我收回我之前评论的最后一部分）。

假设您的文件被命名为
numeric part number - space - optional revision & space - description

也就是说，部件号的位数可以是可变的，但是部件号后面的 space 是必需的并且始终存在。

所以你的例子 "12345 RevA Really Big Part 2: Electric Bugaloo" 符合 partNum==12345, revision="RevA ", description="Really Big Part 2: Electric Bugaloo"

用户输入部件号 P/N: 123 作为变量 userPN，然后您将 glob 构造为
String glob = userPN + " *"; 导致 glob 等于 "123 *"
这将不匹配 12345，如您所愿，因为 3 之后的 space 将不匹配 4.

如果文件名中的部件号后不是一个必需的space，但后面是总是字母，无论是版本还是描述，您都可以将 glob 构造为
String glob = userPN + "[A-Z,a-z]*"; 给出 glob = 123[A-Z,a-z]* 也不会匹配 12345，因为字母必须跟在 123 之后，而 4 不在该字符范围内。

你可以让你的字符范围更复杂，比如 [A-Z,a-z, ] 一个可选 space，这取决于你的需要，但是这一切都归结为您的文件命名约定。您需要非常准确地说明该约定并遵守它。

[1] a PathMatcher 如果在调用 FileSystem.getPathMatcher(String)。这将类似于

FileSystem fs = FileSystems.getDefault(); PathMatcher pm = fs.getPathMatcher("regex:\d{5}\s.*");

Answer 2

我认为你走的路很复杂。当您不查找事件时，为什么首先要使用 pat Marc her。

遍历文件树并为每个目录迭代目录流以匹配您的 glob 会容易得多。

Java 正则表达式和 PathMatcher

Java Regex and PathMatcher

java

regex

matcher