保留带有贪婪令牌的 REGEX 定界符
Keep REGEX delimiter with greedy token
美好的一天,
我正在 Java 中编写方程式计算器并使用 REGEX 来识别值,包括我在其中一个提要中发现的科学记数法(并略微采用),如下所示:
[\d.]+(?:E-?\d+)?
我遇到的问题是我想保留分隔值。我怎样才能做到这一点?我在 regex101.com 上玩过它,但是,当我使用向前看和向后看时,它会抱怨贪婪的令牌。
我在 Whosebug 上找到了其他几个 REGEX,但找不到一个保留分隔符的。
提前致谢!
可能不是世界上最快的事情,但你可以像这样做一些:
/**
* Holds onto and supplies the supplied split delimiter(s) to the split
* array elements.<br><br>
* <p>
* This method creates a Regular Expression (RegEx) that is to be placed
* within a String.split() method to acquire the desired array
* content.<br><br>
*
* @param inputString (String) The string to split.<br>
*
* @param delimiterPosition (Integer) A integer value of either 0, 1, or 2.
* The specific value determines how the detected
* delimiter types are placed within the array:<pre>
*
* 0 Delimiter as separate element:
* a;b;c;d = [a, ;, b, ;, c, ;, d]
* Core regex is: .split("((?<=;)|(?=;))")
* Lookahead and Lookbehind used.
*
* 1 Delimiter at end of each element except last:
* a;b;c;d = [a;, b;, c;, d]
* Core regex is: .split("(?<=;)")
* Lookahead used only.
*
* 2 Delimiter at beginning of each element except first:
* a;b;c;d = [a, ;b, ;c, ;d]
* Core regex is: .split("(?=;)")
* Lookbehind used only.</pre><br>
*
* If nothing is supplied then each character of the supplied input string
* is split into the sting array.<br><br>
*
* If any supplied delimiters or delimiter characters happen to be RegEx
* Meta Characters such as: ( ) [ ] { { \ ^ $ | ? * + . < > - = ! for
* example then those delimiters must be Escaped with a Double Backslash
* (ie: "\+" ) when supplied otherwise an exception will occur.<br>
*
* @param delimiters (1D String Array or one to multiple comma
* delimited String Entries) Any number of string
* delimiters can be supplied as long as they are
* separated with a comma (,).<br>
*
* @return (String) The Regular Expression (RegEx) to be used within a
* String.split() method.
*/
public static String[] SplitAndKeepDelimiters(String inputString, int delimiterPosition, String... delimiters) {
if (delimiters.length < 1) {
return inputString.split("");
}
// build regex...
String regEx = "";
for (int i = 0; i < delimiters.length; i++) {
switch (delimiterPosition) {
case 0:
regEx += regEx.isEmpty() ? "((?<=" + delimiters[i] + ")|(?=" + delimiters[i] + "))"
: "|((?<=" + delimiters[i] + ")|(?=" + delimiters[i] + "))";
break;
case 1:
regEx += regEx.isEmpty() ? "(?<=" + delimiters[i] + ")"
: "|(?<=" + delimiters[i] + ")";
break;
case 2:
regEx += regEx.isEmpty() ? "(?=" + delimiters[i] + ")"
: "|(?=" + delimiters[i] + ")";
break;
}
}
return inputString.split(regEx);
}
上述方法将允许您拆分多个分隔符。
除了使用拆分,您还可以使用交替来获取匹配项,或者匹配所有未直接跟在第一个模式后面的字符。
[\d.]+(?:E-?\d+)?|(?:(?![\d.]+(?:E-?\d+)?).)+
模式匹配:
[\d.]+(?:E-?\d+)?
你的科学记数法模式
|
或
(?:
非捕获组
(?![\d.]+(?:E-?\d+)?).
负前瞻,当科学记数法不在正右时匹配单个字符
)+
关闭非捕获组,并重复1+次以匹配至少一个字符
例如
String regex = "[\d.]+(?:E-?\d+)?|(?:(?![\d.]+(?:E-?\d+)?).)+";
String string = "cos(2123.324E3)*ln(e^x)+123.345E-6*sin(sin(sin(x)))";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
输出
cos(
2123.324E3
)*ln(e^x)+
123.345E-6
*sin(sin(sin(x)))
美好的一天,
我正在 Java 中编写方程式计算器并使用 REGEX 来识别值,包括我在其中一个提要中发现的科学记数法(并略微采用),如下所示:
[\d.]+(?:E-?\d+)?
我遇到的问题是我想保留分隔值。我怎样才能做到这一点?我在 regex101.com 上玩过它,但是,当我使用向前看和向后看时,它会抱怨贪婪的令牌。
我在 Whosebug 上找到了其他几个 REGEX,但找不到一个保留分隔符的。
提前致谢!
可能不是世界上最快的事情,但你可以像这样做一些:
/**
* Holds onto and supplies the supplied split delimiter(s) to the split
* array elements.<br><br>
* <p>
* This method creates a Regular Expression (RegEx) that is to be placed
* within a String.split() method to acquire the desired array
* content.<br><br>
*
* @param inputString (String) The string to split.<br>
*
* @param delimiterPosition (Integer) A integer value of either 0, 1, or 2.
* The specific value determines how the detected
* delimiter types are placed within the array:<pre>
*
* 0 Delimiter as separate element:
* a;b;c;d = [a, ;, b, ;, c, ;, d]
* Core regex is: .split("((?<=;)|(?=;))")
* Lookahead and Lookbehind used.
*
* 1 Delimiter at end of each element except last:
* a;b;c;d = [a;, b;, c;, d]
* Core regex is: .split("(?<=;)")
* Lookahead used only.
*
* 2 Delimiter at beginning of each element except first:
* a;b;c;d = [a, ;b, ;c, ;d]
* Core regex is: .split("(?=;)")
* Lookbehind used only.</pre><br>
*
* If nothing is supplied then each character of the supplied input string
* is split into the sting array.<br><br>
*
* If any supplied delimiters or delimiter characters happen to be RegEx
* Meta Characters such as: ( ) [ ] { { \ ^ $ | ? * + . < > - = ! for
* example then those delimiters must be Escaped with a Double Backslash
* (ie: "\+" ) when supplied otherwise an exception will occur.<br>
*
* @param delimiters (1D String Array or one to multiple comma
* delimited String Entries) Any number of string
* delimiters can be supplied as long as they are
* separated with a comma (,).<br>
*
* @return (String) The Regular Expression (RegEx) to be used within a
* String.split() method.
*/
public static String[] SplitAndKeepDelimiters(String inputString, int delimiterPosition, String... delimiters) {
if (delimiters.length < 1) {
return inputString.split("");
}
// build regex...
String regEx = "";
for (int i = 0; i < delimiters.length; i++) {
switch (delimiterPosition) {
case 0:
regEx += regEx.isEmpty() ? "((?<=" + delimiters[i] + ")|(?=" + delimiters[i] + "))"
: "|((?<=" + delimiters[i] + ")|(?=" + delimiters[i] + "))";
break;
case 1:
regEx += regEx.isEmpty() ? "(?<=" + delimiters[i] + ")"
: "|(?<=" + delimiters[i] + ")";
break;
case 2:
regEx += regEx.isEmpty() ? "(?=" + delimiters[i] + ")"
: "|(?=" + delimiters[i] + ")";
break;
}
}
return inputString.split(regEx);
}
上述方法将允许您拆分多个分隔符。
除了使用拆分,您还可以使用交替来获取匹配项,或者匹配所有未直接跟在第一个模式后面的字符。
[\d.]+(?:E-?\d+)?|(?:(?![\d.]+(?:E-?\d+)?).)+
模式匹配:
[\d.]+(?:E-?\d+)?
你的科学记数法模式|
或(?:
非捕获组(?![\d.]+(?:E-?\d+)?).
负前瞻,当科学记数法不在正右时匹配单个字符
)+
关闭非捕获组,并重复1+次以匹配至少一个字符
例如
String regex = "[\d.]+(?:E-?\d+)?|(?:(?![\d.]+(?:E-?\d+)?).)+";
String string = "cos(2123.324E3)*ln(e^x)+123.345E-6*sin(sin(sin(x)))";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
输出
cos(
2123.324E3
)*ln(e^x)+
123.345E-6
*sin(sin(sin(x)))