正则表达式:在段落中出现任何字符串列表后删除任何单词中的后缀字符串

Regex: Remove postfix string in any word after occurance of any of list of strings in a paragraph

我有一个更大的字符串和一个字符串列表。我想改变更大的字符串,这样 - 对于更大字符串中出现的任何字符串,删除后缀部分直到下一个 space.

Bigger String

WITH dataTab0 AS (SELECT TO_CHAR(to_date(tab_0_0.times),'YYYYMMDD')  AS TIME_ID_CATEGORYe93bc60a0041,tab_0_0.request_id AS PAGE_IMPRESSIONf6beefc4b44e4b  FROM full_contents_2

List

TIME_ID_CATEGORY
PAGE_IMPRESSION
...

我需要删除 TIME_ID_CATEGORY 和 PAGE_IMPRESSION

之后的 e93bc60a0041 和 f6beefc4b44e4b 等后缀

我期待以下结果。我需要 java 中的正则表达式 based/effective 解决方案来实现相同的目的。

WITH dataTab0 AS (SELECT TO_CHAR(to_date(tab_0_0.times),'YYYYMMDD')  AS TIME_ID_CATEGORY,tab_0_0.request_id AS PAGE_IMPRESSION  FROM full_contents_2

我猜可能是一个简单的表达式,

[a-f0-9]{14}

如果我们只有那些 14 长度的子字符串,用空字符串替换可能在这里确实有效。


If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


这样的事情怎么样?本质上将 TIME_ID_CATEGORY 或 PAGE_IMPRESSION 匹配到第 1 组,以及后面的任何内容(即后缀)作为第 2 组。

(TIME_ID_CATEGORY|PAGE_IMPRESSION)(\w+)

Regex Demo

然后简单地用空字符串替换第2组的内容。或者只替换为第 1 组,这也会去掉后缀(见下面的代码片段)。

示例代码片段:

public static void main(String args[]) throws Exception {

    String line = "WITH dataTab0 AS (SELECT TO_CHAR(to_date(tab_0_0.times),'YYYYMMDD')  AS TIME_ID_CATEGORYe93bc60a0041,tab_0_0.request_id AS PAGE_IMPRESSIONf6beefc154b44e4b  FROM full_contents_2";
    Pattern p = Pattern.compile("(TIME_ID_CATEGORY|PAGE_IMPRESSION)(\w+)");
    Matcher m = p.matcher(line);
    if (m.find()) {
        String output = m.replaceAll("");
        System.out.println(output);
        //WITH dataTab0 AS (SELECT TO_CHAR(to_date(tab_0_0.times),'YYYYMMDD')  AS TIME_ID_CATEGORY,tab_0_0.request_id AS PAGE_IMPRESSION  FROM full_contents_2

    }

}