字符串 API 中的 ReplaceAll 方法

ReplaceAll method in String API

我有一个情况,我必须从字符串中替换一些字符(特殊的、不可打印的和其他特殊字符),如下所述

 private static final String NON_ASCII_CHARACTERS = "[^\x00-\x7F]";
    private static final String ASCII_CONTROL_CHARACTERS = "[\p{Cntrl}&&[^\r\n\t]]";
    private static final String NON_PRINTABLE_CHARACTERS = "\p{C}";

stringValue.replaceAll(NON_ASCII_CHARACTERS, "").replaceAll(ASCII_CONTROL_CHARACTERS, "")
                .replaceAll(NON_PRINTABLE_CHARACTERS, "");
            

我们可以重构上面的代码意味着我们可以使用单个“replaceAll”方法并将所有条件放入其中吗?

有什么办法请指教

您可以使用正则表达式或运算符|

private static final String NON_ASCII_CHARACTERS = "[^\x00-\x7F]";
private static final String ASCII_CONTROL_CHARACTERS = "[\p{Cntrl}&&[^\r\n\t]]";
private static final String NON_PRINTABLE_CHARACTERS = "\p{C}";

public static String process(String stringValue) {
    return stringValue.replaceAll(NON_ASCII_CHARACTERS + "|"+ ASCII_CONTROL_CHARACTERS +"|"+ NON_PRINTABLE_CHARACTERS, "");
}

public static void main(String[] args) {
    String val = process("A9339a0zzz]3");
    System.out.println(val);
}

代码点

您可以考虑使用正则表达式以外的其他途径。每个字符可以使用code point整数,字符类别查询Characterclass。

String input = … ;
String output = 
    input
    .codePoints()  // Returns an `IntStream` of code point `int` values.
    .filter( codePoint -> ! Character.isISOControl( codePoint ) )  // Filter for the characters you want to keep. Those code points flunking the `Predicate` test will be omitted. 
    .filter( codePoint -> codePoint < 127 ) ;  // Within US-ASCII range. Code point 127 is US-ASCII but is DEL, so we filter that out here. 
    .collect( StringBuilder :: new , StringBuilder :: appendCodePoint , StringBuilder :: append )  // Convert the `int` code point integers back into characters. 
    .toString() ;  // Make a `String` from the contents of the `StringBuilder`. 

Character class has many of the classifications defined by the Unicode Consortium。您可以使用它们将代码点流缩小到代表您所需字符的代码点。

根据 Pattern javadocs,也应该可以将三个字符 class 模式组合成一个字符 class:

private static final String NON_ASCII_CHARACTERS = "[^\x00-\x7F]";
private static final String ASCII_CONTROL_CHARACTERS = "[\p{Cntrl}&&[^\r\n\t]]";
private static final String NON_PRINTABLE_CHARACTERS = "\p{C}";

变成

private static final String COMBINED =
  "[[^\x00-\x7F][\p{Cntrl}&&[^\r\n\t]]\p{C}]";

private static final String COMBINED =
    "[" + NON_ASCII_CHARACTERS + ASCII_CONTROL_CHARACTERS 
        + NON_PRINTABLE_CHARACTERS + "]";

请注意,&&(交集)的优先级低于隐式联合运算符,因此上面的所有 [] meta-characters 都是必需的。

您决定您认为哪个版本更清楚。见仁见智。