通过正则表达式删除维基文本超链接

Remove wikitext hyperlinks via regex

有两种不同类型的 wiki 文本超链接:

[[stack]]
[[heap (memory region)|heap]]

我想删除超链接但保留文本:

stack
heap

目前,我运行两个阶段,使用两个不同的正则表达式:

public class LinkRemover
{
    private static final Pattern
    renamingLinks = Pattern.compile("\[\[[^\]]+?\|(.+?)\]\]");

    private static final Pattern
    simpleLinks = Pattern.compile("\[\[(.+?)\]\]");

    public static String removeLinks(String input)
    {
        String temp = renamingLinks.matcher(input).replaceAll("");
        return simpleLinks.matcher(temp).replaceAll("");
    }
}

有没有办法"fuse"将两个正则表达式合二为一,达到相同的结果?

如果您想检查您提出的解决方案的正确性,这里有一个简单的测试 class:

public class LinkRemoverTest
{
    @Test
    public void test()
    {
        String input = "A sheep's [[wool]] is the most widely used animal fiber, and is usually harvested by [[Sheep shearing|shearing]].";
        String expected = "A sheep's wool is the most widely used animal fiber, and is usually harvested by shearing.";
        String output = LinkRemover.removeLinks(input);
        assertEquals(expected, output);
    }
}

您可以使零件直到管道可选:

\[\[(?:[^\]|]*\|)?([^\]]+)\]\]

为了确保始终位于方括号之间,请使用字符 类。

fiddle(点击Java按钮)

图案详情:

\[\[         # literals opening square brackets
(?:            # open a non-capturing group
    [^\]|]*   # zero or more characters that are not a ] or a |
    \|        # literal |
)?             # make the group optional
([^\]]+)      # capture all until the closing square bracket
\]\]