正则表达式中\W、\\W、[^a-zA-Z0-9_]的区别

Question

我正在尝试查找所有字符，不是字母(upper/lowercase)、数字和下划线，并将其删除。

stringA.replaceAll("[^a-zA-Z0-9_]","")   // works perfectly fine

然而，下面的代码在Java中甚至无法编译：

stringA.replaceAll("\W","");
// or also
stringA.replaceAll("[\W]","");
// or also
stringA.replaceAll("[\W]","");

如果我只使用 "\W" 而不是 "\W"，上面的代码证明是正确的。
那么，\W、\W 之间有什么区别，什么时候使用 [^a-zA-Z0-9_]

Answer 1

However, the following code could not even compile in Java

Java 不知道该字符串将进入正则表达式引擎。双引号中的任何内容都是 Java 编译器的字符串文字，因此它会尝试将 \W 解释为不存在的 Java escape sequence。这会触发编译时错误。

If I use only \W rather than \W, the above code turns out to be correct.

这是因为\是一个有效的转义序列，即"a single slash"。当您在字符串文字中放置两个斜杠时，Java 编译器会删除一个斜杠，因此正则表达式引擎看到的是 \W，而不是 \W

So, what is the differences between \W, \W, and when to use brackets like [^a-zA-Z0-9_]

第三个是第二个的加长版；第一个不编译。

Differences between \W, \\W, [^a-zA-Z0-9_] in regular expression