通过正则表达式删除维基文本超链接
Remove wikitext hyperlinks via regex
有两种不同类型的 wiki 文本超链接:
[[stack]]
[[heap (memory region)|heap]]
我想删除超链接但保留文本:
stack
heap
目前,我运行两个阶段,使用两个不同的正则表达式:
public class LinkRemover
{
private static final Pattern
renamingLinks = Pattern.compile("\[\[[^\]]+?\|(.+?)\]\]");
private static final Pattern
simpleLinks = Pattern.compile("\[\[(.+?)\]\]");
public static String removeLinks(String input)
{
String temp = renamingLinks.matcher(input).replaceAll("");
return simpleLinks.matcher(temp).replaceAll("");
}
}
有没有办法"fuse"将两个正则表达式合二为一,达到相同的结果?
如果您想检查您提出的解决方案的正确性,这里有一个简单的测试 class:
public class LinkRemoverTest
{
@Test
public void test()
{
String input = "A sheep's [[wool]] is the most widely used animal fiber, and is usually harvested by [[Sheep shearing|shearing]].";
String expected = "A sheep's wool is the most widely used animal fiber, and is usually harvested by shearing.";
String output = LinkRemover.removeLinks(input);
assertEquals(expected, output);
}
}
您可以使零件直到管道可选:
\[\[(?:[^\]|]*\|)?([^\]]+)\]\]
为了确保始终位于方括号之间,请使用字符 类。
fiddle(点击Java按钮)
图案详情:
\[\[ # literals opening square brackets
(?: # open a non-capturing group
[^\]|]* # zero or more characters that are not a ] or a |
\| # literal |
)? # make the group optional
([^\]]+) # capture all until the closing square bracket
\]\]
有两种不同类型的 wiki 文本超链接:
[[stack]]
[[heap (memory region)|heap]]
我想删除超链接但保留文本:
stack
heap
目前,我运行两个阶段,使用两个不同的正则表达式:
public class LinkRemover
{
private static final Pattern
renamingLinks = Pattern.compile("\[\[[^\]]+?\|(.+?)\]\]");
private static final Pattern
simpleLinks = Pattern.compile("\[\[(.+?)\]\]");
public static String removeLinks(String input)
{
String temp = renamingLinks.matcher(input).replaceAll("");
return simpleLinks.matcher(temp).replaceAll("");
}
}
有没有办法"fuse"将两个正则表达式合二为一,达到相同的结果?
如果您想检查您提出的解决方案的正确性,这里有一个简单的测试 class:
public class LinkRemoverTest
{
@Test
public void test()
{
String input = "A sheep's [[wool]] is the most widely used animal fiber, and is usually harvested by [[Sheep shearing|shearing]].";
String expected = "A sheep's wool is the most widely used animal fiber, and is usually harvested by shearing.";
String output = LinkRemover.removeLinks(input);
assertEquals(expected, output);
}
}
您可以使零件直到管道可选:
\[\[(?:[^\]|]*\|)?([^\]]+)\]\]
为了确保始终位于方括号之间,请使用字符 类。
fiddle(点击Java按钮)
图案详情:
\[\[ # literals opening square brackets
(?: # open a non-capturing group
[^\]|]* # zero or more characters that are not a ] or a |
\| # literal |
)? # make the group optional
([^\]]+) # capture all until the closing square bracket
\]\]