如何独立于语言环境在字符串中查找括号?
How to find parentheses in a string independentely of the locale?
我需要在 Java 字符串中找到第一对完整的括号,如果它是非嵌套的,则 return 它的内容。
目前的问题是括号可能在不同的locales/languages.
中用不同的字符表示
我的第一个想法当然是使用正则表达式。
但是,如果使用类似 "\((.*)\)" 的东西,要确保当前考虑的匹配中没有嵌套括号似乎相当困难(至少对我而言),似乎没有class 个类似括号的字符在 Java 的匹配器中可用。
因此,我试图更命令式地解决问题,但偶然发现我需要处理的数据是不同语言的问题,并且根据语言环境的不同,括号中的字符也不同。西文: (), 中文(Locale "zh"): ()
package main;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.HashSet;
import java.util.Set;
public class FindParentheses {
static public Set<String> searchNames(final String string) throws IOException {
final Set<String> foundName = new HashSet<>();
final BufferedReader stringReader = new BufferedReader(new StringReader(string));
for (String line = stringReader.readLine(); line != null; line = stringReader.readLine()) {
final int indexOfFirstOpeningBrace = line.indexOf('(');
if (indexOfFirstOpeningBrace > -1) {
final String afterFirstOpeningParenthesis = line.substring(indexOfFirstOpeningBrace + 1);
final int indexOfNextOpeningParenthesis = afterFirstOpeningParenthesis.indexOf('(');
final int indexOfNextClosingParenthesis = afterFirstOpeningParenthesis.indexOf(')');
/*
* If the following condition is fulfilled, there is a simple braced expression
* after the found product's short name. Otherwise, there may be an additional
* nested pair of braces, or the closing brace may be missing, in which cases the
* expression is rejected as a product's long name.
*/
if (indexOfNextClosingParenthesis > 0
&& (indexOfNextClosingParenthesis < indexOfNextOpeningParenthesis
|| indexOfNextOpeningParenthesis < 0)) {
final String content = afterFirstOpeningParenthesis.substring(0, indexOfNextClosingParenthesis);
foundName.add(content);
}
}
}
return foundName;
}
public static void main(final String args[]) throws IOException {
for (final String foundName : searchNames(
"Something meaningful: shortName1 (LongName 1).\n" +
"Localization issue here: shortName2 (保险丝2). This one should be found, too.\n" +
"Easy again: shortName3 (LongName 3).\n" +
"Yet more random text...")) {
System.out.println(foundName);
}
}
}
第二个带中文括号的东西没有找到,但是应该有。
当然,我可能会匹配这些字符作为额外的特例,但由于我的项目使用 23 种语言,包括韩语和日语,我更喜欢找到任何括号对的解决方案。
我猜你可能想设计一个表达式,可能类似于:
[((]\s*([^))]*)\s*[))]
你想要的括号在这些字符中的位置 类:
[((]
测试
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class re{
public static void main(String[] args){
final String regex = "[((]\s*([^))]*)\s*[))]";
final String string = "Something meaningful: shortName1 (LongName 1) Localization issue here: shortName2 (保险丝2). This one should be found, too. Easy again: shortName3 (LongName 3). Yet more random text... Something meaningful: shortName1 (LongName 1) Localization issue here: shortName2 (保险丝2). This one should be found, too. Easy again: shortName3 (LongName 3). Yet more random text...";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
输出
Full match: (LongName 1)
Group 1: LongName 1
Full match: (保险丝2)
Group 1: 保险丝2
Full match: (LongName 3)
Group 1: LongName 3
Full match: (LongName 1)
Group 1: LongName 1
Full match: (保险丝2)
Group 1: 保险丝2
Full match: (LongName 3)
Group 1: LongName 3
另一种选择是:
(?<=[((])[^))]*(?=[))])
这将输出:
Full match: LongName 1
Full match: 保险丝2
Full match: LongName 3
Full match: LongName 1
Full match: 保险丝2
Full match: LongName 3
演示
该表达式在 regex101.com, if you wish to explore/simplify/modify it, and in this link 的右上面板进行了解释,如果您愿意,您可以观察它如何与一些示例输入匹配。
参考
List of all unicode's open/close brackets?
您可以使用 \p{Ps}
Punctuation, Open and \p{Pe}
, Punctuation, Close,Unicode 类别 classes.
String par_paired_punct = "\p{Ps}([^\p{Ps}\p{Pe}]*)\p{Pe}";
它们比括号更匹配一些,但您可以排除不需要的字符 "manually"。
In Punctuation, Open class, 以下字符不是左括号或圆括号:
U+0F3A TIBETAN MARK GUG RTAGS GYON ༺
U+0F3C TIBETAN MARK ANG KHANG GYON ༼
U+169B OGHAM FEATHER MARK ᚛
U+201A SINGLE LOW-9 QUOTATION MARK ‚
U+201E DOUBLE LOW-9 QUOTATION MARK „
U+27C5 LEFT S-SHAPED BAG DELIMITER ⟅
U+29D8 LEFT WIGGLY FENCE ⧘
U+29DA LEFT DOUBLE WIGGLY FENCE ⧚
U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK ⹂
U+301D REVERSED DOUBLE PRIME QUOTATION MARK 〝
U+FD3F ORNATE RIGHT PARENTHESIS ﴿
在标点符号中,关闭 class,以下不是成对的括号字符:
U+0F3B TIBETAN MARK GUG RTAGS GYAS ༻
U+0F3D TIBETAN MARK ANG KHANG GYAS ༽
U+169C OGHAM REVERSED FEATHER MARK ᚜
U+27C6 RIGHT S-SHAPED BAG DELIMITER ⟆
U+29D9 RIGHT WIGGLY FENCE ⧙
U+29DB RIGHT DOUBLE WIGGLY FENCE ⧛
U+301E DOUBLE PRIME QUOTATION MARK 〞
U+301F LOW DOUBLE PRIME QUOTATION MARK 〟
U+FD3E ORNATE LEFT PARENTHESIS ﴾
正则表达式看起来像
String par_rx = "[\p{Ps}&&[^\u0F3\u0F3C\u169B\u201A\u201E\u27C5\u29D8\u29DA\u2E42\u301D\uFD3F]]" +
"((?:[^\p{Ps}\p{Pe}]|[\u0F3\u0F3C\u169B\u201A\u201E\u27C5\u29D8\u29DA\u2E42\u301D\uFD3F\u0F3B\u0F3D\u169C\u27C6\u29D9\u29DB\u301E\u301F\uFD3E])*)" +
"[\p{Pe}&&[^\u0F3B\u0F3D\u169C\u27C6\u29D9\u29DB\u301E\u301F\uFD3E]]";
Emma's links to Brian Campbell's list of all Unicode brackets. I used it to enumerate all relevant characters, as Wiktor Stribiżew建议;就我而言,所有括号都很有趣。
此外,我更愿意确保只考虑匹配的括号,这导致我在 Java:
中出现了这个丑陋的正则表达式
public static final String ANY_PARENTHESES = "\([^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+\)|⁽[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⁾|₍[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+₎|❨[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+❩|❪[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+❫|⟮[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⟯|⦅[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⦆|⸨[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⸩|﴾[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+﴿|︵[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+︶|﹙[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+﹚|([^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+)|⦅[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⦆";
我实际上是用下面的代码构建的:
public static final char LEFT_PARENTHESIS = '\u0028', // (
SUPERSCRIPT_LEFT_PARENTHESIS = '\u207D', // ⁽
SUBSCRIPT_LEFT_PARENTHESIS = '\u208D', // ₍
MEDIUM_LEFT_PARENTHESIS_ORNAMENT = '\u2768', // ❨
MEDIUM_FLATTENED_LEFT_PARENTHESIS_ORNAMENT = '\u276A', // ❪
MATHEMATICAL_LEFT_FLATTENED_PARENTHESIS = '\u27EE', // ⟮
LEFT_WHITE_PARENTHESIS = '\u2985', // ⦅
LEFT_DOUBLE_PARENTHESIS = '\u2E28', // ⸨
ORNATE_LEFT_PARENTHESIS = '\uFD3E', // ﴾
PRESENTATION_FORM_FOR_VERTICAL_LEFT_PARENTHESIS = '\uFE35', // ︵
SMALL_LEFT_PARENTHESIS = '\uFE59', // ﹙
FULLWIDTH_LEFT_PARENTHESIS = '\uFF08', // (
FULLWIDTH_LEFT_WHITE_PARENTHESIS = '\uFF5F'; // ⦅
public static final char RIGHT_PARENTHESIS = '\u0029', // )
SUPERSCRIPT_RIGHT_PARENTHESIS = '\u207E', // ⁾
SUBSCRIPT_RIGHT_PARENTHESIS = '\u208E', // ₎
MEDIUM_RIGHT_PARENTHESIS_ORNAMENT = '\u2769', // ❩
MEDIUM_FLATTENED_RIGHT_PARENTHESIS_ORNAMENT = '\u276B', // ❫
MATHEMATICAL_RIGHT_FLATTENED_PARENTHESIS = '\u27EF', // ⟯
RIGHT_WHITE_PARENTHESIS = '\u2986', // ⦆
RIGHT_DOUBLE_PARENTHESIS = '\u2E29', // ⸩
ORNATE_RIGHT_PARENTHESIS = '\uFD3F', // ﴿
PRESENTATION_FORM_FOR_VERTICAL_RIGHT_PARENTHESIS = '\uFE36', // ︶
SMALL_RIGHT_PARENTHESIS = '\uFE5A', // ﹚
FULLWIDTH_RIGHT_PARENTHESIS = '\uFF09', // )
FULLWIDTH_RIGHT_WHITE_PARENTHESIS = '\uFF60'; // ⦆
public static final String NO_PARENTHESES = "[^\" + LEFT_PARENTHESIS + SUPERSCRIPT_LEFT_PARENTHESIS
+ SUBSCRIPT_LEFT_PARENTHESIS + MEDIUM_LEFT_PARENTHESIS_ORNAMENT + MEDIUM_FLATTENED_LEFT_PARENTHESIS_ORNAMENT
+ MATHEMATICAL_LEFT_FLATTENED_PARENTHESIS + LEFT_WHITE_PARENTHESIS + LEFT_DOUBLE_PARENTHESIS
+ ORNATE_LEFT_PARENTHESIS + PRESENTATION_FORM_FOR_VERTICAL_LEFT_PARENTHESIS + SMALL_LEFT_PARENTHESIS
+ FULLWIDTH_LEFT_PARENTHESIS + FULLWIDTH_LEFT_WHITE_PARENTHESIS + "\" + RIGHT_PARENTHESIS
+ SUPERSCRIPT_RIGHT_PARENTHESIS + SUBSCRIPT_RIGHT_PARENTHESIS + MEDIUM_RIGHT_PARENTHESIS_ORNAMENT
+ MEDIUM_FLATTENED_RIGHT_PARENTHESIS_ORNAMENT + MATHEMATICAL_RIGHT_FLATTENED_PARENTHESIS
+ RIGHT_WHITE_PARENTHESIS + RIGHT_DOUBLE_PARENTHESIS + ORNATE_RIGHT_PARENTHESIS
+ PRESENTATION_FORM_FOR_VERTICAL_RIGHT_PARENTHESIS + SMALL_RIGHT_PARENTHESIS + FULLWIDTH_RIGHT_PARENTHESIS
+ FULLWIDTH_RIGHT_WHITE_PARENTHESIS + "]+";
public static final String PARENTHESES = "\" + LEFT_PARENTHESIS + NO_PARENTHESES + "\" + RIGHT_PARENTHESIS;
public static final String SUPERSCRIPT_PARENTHESES =
"" + SUPERSCRIPT_LEFT_PARENTHESIS + NO_PARENTHESES + SUPERSCRIPT_RIGHT_PARENTHESIS;
public static final String SUBSCRIPT_PARENTHESES =
"" + SUBSCRIPT_LEFT_PARENTHESIS + NO_PARENTHESES + SUBSCRIPT_RIGHT_PARENTHESIS;
public static final String MEDIUM_PARENTHESES_ORNAMENT =
"" + MEDIUM_LEFT_PARENTHESIS_ORNAMENT + NO_PARENTHESES + MEDIUM_RIGHT_PARENTHESIS_ORNAMENT;
public static final String MEDIUM_FLATTENED_PARENTHESES_ORNAMENT =
"" + MEDIUM_FLATTENED_LEFT_PARENTHESIS_ORNAMENT + NO_PARENTHESES + MEDIUM_FLATTENED_RIGHT_PARENTHESIS_ORNAMENT;
public static final String MATHEMATICAL_FLATTENED_PARENTHESES =
"" + MATHEMATICAL_LEFT_FLATTENED_PARENTHESIS + NO_PARENTHESES + MATHEMATICAL_RIGHT_FLATTENED_PARENTHESIS;
public static final String WHITE_PARENTHESES =
"" + LEFT_WHITE_PARENTHESIS + NO_PARENTHESES + RIGHT_WHITE_PARENTHESIS;
public static final String DOUBLE_PARENTHESES =
"" + LEFT_DOUBLE_PARENTHESIS + NO_PARENTHESES + RIGHT_DOUBLE_PARENTHESIS;
public static final String ORNATE_PARENTHESES =
"" + ORNATE_LEFT_PARENTHESIS + NO_PARENTHESES + ORNATE_RIGHT_PARENTHESIS;
public static final String PRESENTATION_FORM_FOR_VERTICAL_PARENTHESES =
"" + PRESENTATION_FORM_FOR_VERTICAL_LEFT_PARENTHESIS + NO_PARENTHESES
+ PRESENTATION_FORM_FOR_VERTICAL_RIGHT_PARENTHESIS;
public static final String SMALL_PARENTHESES =
"" + SMALL_LEFT_PARENTHESIS + NO_PARENTHESES + SMALL_RIGHT_PARENTHESIS;
public static final String FULLWIDTH_PARENTHESES =
"" + FULLWIDTH_LEFT_PARENTHESIS + NO_PARENTHESES + FULLWIDTH_RIGHT_PARENTHESIS;
public static final String FULLWIDTH_WHITE_PARENTHESES =
"" + FULLWIDTH_LEFT_WHITE_PARENTHESIS + NO_PARENTHESES + FULLWIDTH_RIGHT_WHITE_PARENTHESIS;
public static final char XOR = '|';
public static final String ANY_PARENTHESES = PARENTHESES
+ XOR + SUPERSCRIPT_PARENTHESES
+ XOR + SUBSCRIPT_PARENTHESES
+ XOR + MEDIUM_PARENTHESES_ORNAMENT
+ XOR + MEDIUM_FLATTENED_PARENTHESES_ORNAMENT
+ XOR + MATHEMATICAL_FLATTENED_PARENTHESES
+ XOR + WHITE_PARENTHESES
+ XOR + DOUBLE_PARENTHESES
+ XOR + ORNATE_PARENTHESES
+ XOR + PRESENTATION_FORM_FOR_VERTICAL_PARENTHESES
+ XOR + SMALL_PARENTHESES
+ XOR + FULLWIDTH_PARENTHESES
+ XOR + FULLWIDTH_WHITE_PARENTHESES;
但请注意,它不拒绝嵌套括号。
我需要在 Java 字符串中找到第一对完整的括号,如果它是非嵌套的,则 return 它的内容。 目前的问题是括号可能在不同的locales/languages.
中用不同的字符表示我的第一个想法当然是使用正则表达式。 但是,如果使用类似 "\((.*)\)" 的东西,要确保当前考虑的匹配中没有嵌套括号似乎相当困难(至少对我而言),似乎没有class 个类似括号的字符在 Java 的匹配器中可用。
因此,我试图更命令式地解决问题,但偶然发现我需要处理的数据是不同语言的问题,并且根据语言环境的不同,括号中的字符也不同。西文: (), 中文(Locale "zh"): ()
package main;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.HashSet;
import java.util.Set;
public class FindParentheses {
static public Set<String> searchNames(final String string) throws IOException {
final Set<String> foundName = new HashSet<>();
final BufferedReader stringReader = new BufferedReader(new StringReader(string));
for (String line = stringReader.readLine(); line != null; line = stringReader.readLine()) {
final int indexOfFirstOpeningBrace = line.indexOf('(');
if (indexOfFirstOpeningBrace > -1) {
final String afterFirstOpeningParenthesis = line.substring(indexOfFirstOpeningBrace + 1);
final int indexOfNextOpeningParenthesis = afterFirstOpeningParenthesis.indexOf('(');
final int indexOfNextClosingParenthesis = afterFirstOpeningParenthesis.indexOf(')');
/*
* If the following condition is fulfilled, there is a simple braced expression
* after the found product's short name. Otherwise, there may be an additional
* nested pair of braces, or the closing brace may be missing, in which cases the
* expression is rejected as a product's long name.
*/
if (indexOfNextClosingParenthesis > 0
&& (indexOfNextClosingParenthesis < indexOfNextOpeningParenthesis
|| indexOfNextOpeningParenthesis < 0)) {
final String content = afterFirstOpeningParenthesis.substring(0, indexOfNextClosingParenthesis);
foundName.add(content);
}
}
}
return foundName;
}
public static void main(final String args[]) throws IOException {
for (final String foundName : searchNames(
"Something meaningful: shortName1 (LongName 1).\n" +
"Localization issue here: shortName2 (保险丝2). This one should be found, too.\n" +
"Easy again: shortName3 (LongName 3).\n" +
"Yet more random text...")) {
System.out.println(foundName);
}
}
}
第二个带中文括号的东西没有找到,但是应该有。 当然,我可能会匹配这些字符作为额外的特例,但由于我的项目使用 23 种语言,包括韩语和日语,我更喜欢找到任何括号对的解决方案。
我猜你可能想设计一个表达式,可能类似于:
[((]\s*([^))]*)\s*[))]
你想要的括号在这些字符中的位置 类:
[((]
测试
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class re{
public static void main(String[] args){
final String regex = "[((]\s*([^))]*)\s*[))]";
final String string = "Something meaningful: shortName1 (LongName 1) Localization issue here: shortName2 (保险丝2). This one should be found, too. Easy again: shortName3 (LongName 3). Yet more random text... Something meaningful: shortName1 (LongName 1) Localization issue here: shortName2 (保险丝2). This one should be found, too. Easy again: shortName3 (LongName 3). Yet more random text...";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
输出
Full match: (LongName 1)
Group 1: LongName 1
Full match: (保险丝2)
Group 1: 保险丝2
Full match: (LongName 3)
Group 1: LongName 3
Full match: (LongName 1)
Group 1: LongName 1
Full match: (保险丝2)
Group 1: 保险丝2
Full match: (LongName 3)
Group 1: LongName 3
另一种选择是:
(?<=[((])[^))]*(?=[))])
这将输出:
Full match: LongName 1
Full match: 保险丝2
Full match: LongName 3
Full match: LongName 1
Full match: 保险丝2
Full match: LongName 3
演示
该表达式在 regex101.com, if you wish to explore/simplify/modify it, and in this link 的右上面板进行了解释,如果您愿意,您可以观察它如何与一些示例输入匹配。
参考
List of all unicode's open/close brackets?
您可以使用 \p{Ps}
Punctuation, Open and \p{Pe}
, Punctuation, Close,Unicode 类别 classes.
String par_paired_punct = "\p{Ps}([^\p{Ps}\p{Pe}]*)\p{Pe}";
它们比括号更匹配一些,但您可以排除不需要的字符 "manually"。
In Punctuation, Open class, 以下字符不是左括号或圆括号:
U+0F3A TIBETAN MARK GUG RTAGS GYON ༺
U+0F3C TIBETAN MARK ANG KHANG GYON ༼
U+169B OGHAM FEATHER MARK ᚛
U+201A SINGLE LOW-9 QUOTATION MARK ‚
U+201E DOUBLE LOW-9 QUOTATION MARK „
U+27C5 LEFT S-SHAPED BAG DELIMITER ⟅
U+29D8 LEFT WIGGLY FENCE ⧘
U+29DA LEFT DOUBLE WIGGLY FENCE ⧚
U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK ⹂
U+301D REVERSED DOUBLE PRIME QUOTATION MARK 〝
U+FD3F ORNATE RIGHT PARENTHESIS ﴿
在标点符号中,关闭 class,以下不是成对的括号字符:
U+0F3B TIBETAN MARK GUG RTAGS GYAS ༻
U+0F3D TIBETAN MARK ANG KHANG GYAS ༽
U+169C OGHAM REVERSED FEATHER MARK ᚜
U+27C6 RIGHT S-SHAPED BAG DELIMITER ⟆
U+29D9 RIGHT WIGGLY FENCE ⧙
U+29DB RIGHT DOUBLE WIGGLY FENCE ⧛
U+301E DOUBLE PRIME QUOTATION MARK 〞
U+301F LOW DOUBLE PRIME QUOTATION MARK 〟
U+FD3E ORNATE LEFT PARENTHESIS ﴾
正则表达式看起来像
String par_rx = "[\p{Ps}&&[^\u0F3\u0F3C\u169B\u201A\u201E\u27C5\u29D8\u29DA\u2E42\u301D\uFD3F]]" +
"((?:[^\p{Ps}\p{Pe}]|[\u0F3\u0F3C\u169B\u201A\u201E\u27C5\u29D8\u29DA\u2E42\u301D\uFD3F\u0F3B\u0F3D\u169C\u27C6\u29D9\u29DB\u301E\u301F\uFD3E])*)" +
"[\p{Pe}&&[^\u0F3B\u0F3D\u169C\u27C6\u29D9\u29DB\u301E\u301F\uFD3E]]";
Emma's
此外,我更愿意确保只考虑匹配的括号,这导致我在 Java:
中出现了这个丑陋的正则表达式public static final String ANY_PARENTHESES = "\([^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+\)|⁽[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⁾|₍[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+₎|❨[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+❩|❪[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+❫|⟮[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⟯|⦅[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⦆|⸨[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⸩|﴾[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+﴿|︵[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+︶|﹙[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+﹚|([^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+)|⦅[^\(⁽₍❨❪⟮⦅⸨﴾︵﹙(⦅\)⁾₎❩❫⟯⦆⸩﴿︶﹚)⦆]+⦆";
我实际上是用下面的代码构建的:
public static final char LEFT_PARENTHESIS = '\u0028', // (
SUPERSCRIPT_LEFT_PARENTHESIS = '\u207D', // ⁽
SUBSCRIPT_LEFT_PARENTHESIS = '\u208D', // ₍
MEDIUM_LEFT_PARENTHESIS_ORNAMENT = '\u2768', // ❨
MEDIUM_FLATTENED_LEFT_PARENTHESIS_ORNAMENT = '\u276A', // ❪
MATHEMATICAL_LEFT_FLATTENED_PARENTHESIS = '\u27EE', // ⟮
LEFT_WHITE_PARENTHESIS = '\u2985', // ⦅
LEFT_DOUBLE_PARENTHESIS = '\u2E28', // ⸨
ORNATE_LEFT_PARENTHESIS = '\uFD3E', // ﴾
PRESENTATION_FORM_FOR_VERTICAL_LEFT_PARENTHESIS = '\uFE35', // ︵
SMALL_LEFT_PARENTHESIS = '\uFE59', // ﹙
FULLWIDTH_LEFT_PARENTHESIS = '\uFF08', // (
FULLWIDTH_LEFT_WHITE_PARENTHESIS = '\uFF5F'; // ⦅
public static final char RIGHT_PARENTHESIS = '\u0029', // )
SUPERSCRIPT_RIGHT_PARENTHESIS = '\u207E', // ⁾
SUBSCRIPT_RIGHT_PARENTHESIS = '\u208E', // ₎
MEDIUM_RIGHT_PARENTHESIS_ORNAMENT = '\u2769', // ❩
MEDIUM_FLATTENED_RIGHT_PARENTHESIS_ORNAMENT = '\u276B', // ❫
MATHEMATICAL_RIGHT_FLATTENED_PARENTHESIS = '\u27EF', // ⟯
RIGHT_WHITE_PARENTHESIS = '\u2986', // ⦆
RIGHT_DOUBLE_PARENTHESIS = '\u2E29', // ⸩
ORNATE_RIGHT_PARENTHESIS = '\uFD3F', // ﴿
PRESENTATION_FORM_FOR_VERTICAL_RIGHT_PARENTHESIS = '\uFE36', // ︶
SMALL_RIGHT_PARENTHESIS = '\uFE5A', // ﹚
FULLWIDTH_RIGHT_PARENTHESIS = '\uFF09', // )
FULLWIDTH_RIGHT_WHITE_PARENTHESIS = '\uFF60'; // ⦆
public static final String NO_PARENTHESES = "[^\" + LEFT_PARENTHESIS + SUPERSCRIPT_LEFT_PARENTHESIS
+ SUBSCRIPT_LEFT_PARENTHESIS + MEDIUM_LEFT_PARENTHESIS_ORNAMENT + MEDIUM_FLATTENED_LEFT_PARENTHESIS_ORNAMENT
+ MATHEMATICAL_LEFT_FLATTENED_PARENTHESIS + LEFT_WHITE_PARENTHESIS + LEFT_DOUBLE_PARENTHESIS
+ ORNATE_LEFT_PARENTHESIS + PRESENTATION_FORM_FOR_VERTICAL_LEFT_PARENTHESIS + SMALL_LEFT_PARENTHESIS
+ FULLWIDTH_LEFT_PARENTHESIS + FULLWIDTH_LEFT_WHITE_PARENTHESIS + "\" + RIGHT_PARENTHESIS
+ SUPERSCRIPT_RIGHT_PARENTHESIS + SUBSCRIPT_RIGHT_PARENTHESIS + MEDIUM_RIGHT_PARENTHESIS_ORNAMENT
+ MEDIUM_FLATTENED_RIGHT_PARENTHESIS_ORNAMENT + MATHEMATICAL_RIGHT_FLATTENED_PARENTHESIS
+ RIGHT_WHITE_PARENTHESIS + RIGHT_DOUBLE_PARENTHESIS + ORNATE_RIGHT_PARENTHESIS
+ PRESENTATION_FORM_FOR_VERTICAL_RIGHT_PARENTHESIS + SMALL_RIGHT_PARENTHESIS + FULLWIDTH_RIGHT_PARENTHESIS
+ FULLWIDTH_RIGHT_WHITE_PARENTHESIS + "]+";
public static final String PARENTHESES = "\" + LEFT_PARENTHESIS + NO_PARENTHESES + "\" + RIGHT_PARENTHESIS;
public static final String SUPERSCRIPT_PARENTHESES =
"" + SUPERSCRIPT_LEFT_PARENTHESIS + NO_PARENTHESES + SUPERSCRIPT_RIGHT_PARENTHESIS;
public static final String SUBSCRIPT_PARENTHESES =
"" + SUBSCRIPT_LEFT_PARENTHESIS + NO_PARENTHESES + SUBSCRIPT_RIGHT_PARENTHESIS;
public static final String MEDIUM_PARENTHESES_ORNAMENT =
"" + MEDIUM_LEFT_PARENTHESIS_ORNAMENT + NO_PARENTHESES + MEDIUM_RIGHT_PARENTHESIS_ORNAMENT;
public static final String MEDIUM_FLATTENED_PARENTHESES_ORNAMENT =
"" + MEDIUM_FLATTENED_LEFT_PARENTHESIS_ORNAMENT + NO_PARENTHESES + MEDIUM_FLATTENED_RIGHT_PARENTHESIS_ORNAMENT;
public static final String MATHEMATICAL_FLATTENED_PARENTHESES =
"" + MATHEMATICAL_LEFT_FLATTENED_PARENTHESIS + NO_PARENTHESES + MATHEMATICAL_RIGHT_FLATTENED_PARENTHESIS;
public static final String WHITE_PARENTHESES =
"" + LEFT_WHITE_PARENTHESIS + NO_PARENTHESES + RIGHT_WHITE_PARENTHESIS;
public static final String DOUBLE_PARENTHESES =
"" + LEFT_DOUBLE_PARENTHESIS + NO_PARENTHESES + RIGHT_DOUBLE_PARENTHESIS;
public static final String ORNATE_PARENTHESES =
"" + ORNATE_LEFT_PARENTHESIS + NO_PARENTHESES + ORNATE_RIGHT_PARENTHESIS;
public static final String PRESENTATION_FORM_FOR_VERTICAL_PARENTHESES =
"" + PRESENTATION_FORM_FOR_VERTICAL_LEFT_PARENTHESIS + NO_PARENTHESES
+ PRESENTATION_FORM_FOR_VERTICAL_RIGHT_PARENTHESIS;
public static final String SMALL_PARENTHESES =
"" + SMALL_LEFT_PARENTHESIS + NO_PARENTHESES + SMALL_RIGHT_PARENTHESIS;
public static final String FULLWIDTH_PARENTHESES =
"" + FULLWIDTH_LEFT_PARENTHESIS + NO_PARENTHESES + FULLWIDTH_RIGHT_PARENTHESIS;
public static final String FULLWIDTH_WHITE_PARENTHESES =
"" + FULLWIDTH_LEFT_WHITE_PARENTHESIS + NO_PARENTHESES + FULLWIDTH_RIGHT_WHITE_PARENTHESIS;
public static final char XOR = '|';
public static final String ANY_PARENTHESES = PARENTHESES
+ XOR + SUPERSCRIPT_PARENTHESES
+ XOR + SUBSCRIPT_PARENTHESES
+ XOR + MEDIUM_PARENTHESES_ORNAMENT
+ XOR + MEDIUM_FLATTENED_PARENTHESES_ORNAMENT
+ XOR + MATHEMATICAL_FLATTENED_PARENTHESES
+ XOR + WHITE_PARENTHESES
+ XOR + DOUBLE_PARENTHESES
+ XOR + ORNATE_PARENTHESES
+ XOR + PRESENTATION_FORM_FOR_VERTICAL_PARENTHESES
+ XOR + SMALL_PARENTHESES
+ XOR + FULLWIDTH_PARENTHESES
+ XOR + FULLWIDTH_WHITE_PARENTHESES;
但请注意,它不拒绝嵌套括号。