Java: String.replaceAll() vs matcher.replaceAll() 循环

Question

这可能是一个非常简单的问题，也可能是重复的（尽管我确实尝试过事先检查），但是在循环中使用时成本更低，String.replaceAll() 或 matcher.replaceAll()?
当我被告知

Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher;
String thisWord;
while (Scanner.hasNext()) {
   matcher = regexPattern.matcher(Scanner.next());
   thisWord = matcher.replaceAll("");
   ...
}

更好，因为你只需要编译一次正则表达式，我认为

的好处

String thisWord;
while (Scanner.hasNext()) {
   thisWord = Scanner.next().replaceAll("[^a-zA-Z0-9]","");
   ...
}

远远超过 matcher 方法，因为不必每次都初始化 matcher。（我知道 matcher 已经存在，所以您不会重新创建它。）

有人可以解释我的推理是错误的吗？我是不是误解了 Pattern.matcher() 的作用？

Answer 1

在OpenJDK中，String.replaceAll定义如下：

    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

[code link]

所以至少对于那个实现，它不会比只编译一次模式并使用 Matcher.replaceAll.

提供更好的性能

可能还有其他 JDK 实现，其中 String.replaceAll 的实现方式不同，但如果有任何地方比 Matcher.replaceAll 表现更好，我会感到非常惊讶。

[…] due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.)

我认为你对这里有误解。您确实在每次循环迭代时都创建了一个新的 Matcher 实例；但这很便宜，不用担心performance-wise。

顺便说一句，如果您不想要一个单独的 'matcher' 变量，您实际上并不需要一个；如果您这样写，您将获得完全相同的行为和性能：

   thisWord = regexPattern.matcher(Scanner.next()).replaceAll("");

Answer 2

有一种更有效的方法，如果您重置相同的匹配器，那么它不会在循环内的每次情况下都重新生成，这会复制与模式结构相关的大部分相同信息。

Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher = regexPattern.matcher("");
String thisWord;
while (Scanner.hasNext()) {
   matcher = matcher.reset(Scanner.next());
   thisWord = matcher.replaceAll("");
   // ...
}

在循环外创建匹配器需要 one-off 成本 regexPattern.matcher("") 但调用 matcher.reset(xxx) 会更快，因为它们 re-use 那个匹配器而不是 re-generating 每次一个新的匹配器实例。这减少了所需的 GC 量。

Java: String.replaceAll() vs matcher.replaceAll() 循环

Java: String.replaceAll() vs matcher.replaceAll() in a loop

java

matcher

replaceall