Java 字符串全部使用正则表达式替换为先行

Question

我正在尝试从传入的 HTTP 请求中获取规范化的 URI 以在日志中打印。这将帮助我们通过这个规范化的 URI 计算统计数据和其他数据。

为了规范化，我正在尝试使用 x 在 requestURI 上使用正则表达式对除版本（例如 v1）之外的所有数字和字母数字字符串进行字符串替换：

String str = "/v1/profile/abc13abc/13abc/cDe12/abc-bla/text_tw/HELLO/test/random/2234";
str.replaceAll("/([a-zA-Z]*[\d|\-|_]+[a-zA-Z]*)|([0-9]+)","/x");

这导致

/x/profile/x/x/x/x/x/HELLO/test/random/x

我想得到的结果是（不替换v1）

/v1/profile/x/x/x/x/x/HELLO/test/random/x

我试过使用 skip look ahead

String.replaceAll("/(?!v1)([a-zA-Z]*[\d|\-|_]+[a-zA-Z]*)|([0-9]+)","/x");

但没有帮助。任何线索表示赞赏。

谢谢

Answer 1

使用

/(?:(?!v[1-4])[a-zA-Z]*[0-9_-]+[a-zA-Z]*|[0-9]+)

参见regex proof。

解释

--------------------------------------------------------------------------------
  /                        '/'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      v                        'v'
--------------------------------------------------------------------------------
      [1-4]                    any character of: '1' to '4'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    [a-zA-Z]*                any character of: 'a' to 'z', 'A' to 'Z'
                             (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    [0-9_-]+                 any character of: '0' to '9', '_', '-'
                             (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    [a-zA-Z]*                any character of: 'a' to 'z', 'A' to 'Z'
                             (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of grouping

Answer 2

添加了说明后，我将如何处理它。

创建一个元素列表，从第二个元素开始在 / 上拆分。
用第一个元素初始化字符串生成器。
然后简单地遍历从第二个元素开始的子列表。使用String.matches判断是否替换为x。

List<String> pathElements = Arrays.asList(str.substring(1).split("/"));
StringBuilder sb = new StringBuilder("/" + pathElements.get(0));
for(String pe : pathElements.subList(1,pathElements.size())) { 
    sb.append("/").append(pe.matches(".*[\d-_].*") ? "x" : pe);
}

System.out.println(sb);

打印

/v1/profile/x/x/x/x/x/HELLO/test/random/x

Answer 3

与其使用一个大的正则表达式，这对将来人们（可能包括你自己）来说将很难理解和维护，我会选择使用几行，这会使你的逻辑更加明显：

List<String> parts = Arrays.asList(path.split("/"));
parts.replaceAll(
    p -> !p.matches("v\d+") && p.matches(".*[-_\d].*") ? "x" : p);
path = String.join("/", parts);

Java 字符串全部使用正则表达式替换为先行

Java String replace all using regex with lookahead

java

regex

regex-negation