使用 Java 正则表达式在一个句子中查找多个匹配词
Use Java Regex to find multiple matching words in a sentence
我有一句话,一套话说;梅威瑟,不败……等等。
我想:
- 检查句子是否包含上述任何单词……(我希望它只查找匹配的单词,基本上忽略句号、逗号和换行符。)
- 如果是这样,我想在每个匹配的单词前后显示几个单词,也许可以使用
String.format()
这是我的代码,它似乎工作正常但不完全是我想要的:
String sentence = "Floyd Mayweather Jr is an American professional boxer " +
"currently undefeated as a professional and is a five-division world champion, " +
"having won ten world titles and the lineal championship in four different weight classes.";
String newText = "";
Pattern p = Pattern.compile("(Mayweather) .* (undefeated)");
Matcher m = p.matcher(sentence);
if (m.find()) {
String group1 = m.group(1);
String group2 = m.group(2);
newText = String.format("%s ... %s" , group1, group2);
System.out.println(newText);
}
现在的输出是:
Mayweather ... undefeated
我想要的是这样的:
Floyd Mayweather Jr is an American ... currently undefeated as a professional ...
你能告诉我怎么做吗,或者指导我正确的方向,因为我被卡住了。
在此先感谢大家。
你可以试试下面的一种,
注意:这只是一个原型,所以不要直接复制粘贴
String str="Floyd Mayweather Jr is an American professional boxer currently undefeated as a professional and is a five-division world champion, having won ten world titles and the lineal championship in four different weight classes.";
int firstIndex=str.indexOf("American");
int secondIndex=str.indexOf("boxer");
String group1=str.substring(0,firstIndex+"American".length()); // gives you 1st group
String group2=str.substring(secondIndex);
String newText = String.format("%s ... %s" , group1, group2);
System.out.println(newText);
输出
Floyd Mayweather Jr is an American ... boxer currently undefeated as a
professional and is a five-division world champion, having won ten
world titles and the lineal championship in four different weight
classes.
如果你真的想通过 RegEx 解决这个问题,你需要让你的捕获组匹配你想要输出的所有内容。目前它们仅匹配您的搜索字词:
(Mayweather) .* (undefeated)
// "Mayweather", "undefeated"
你可以尝试这样的事情(只使用一组!),但这会匹配你的整个例子:
(.*Mayweather.*undefeated.*)
// -whole text-
可以改成这样,再次匹配两部分,前后最多12个字符(不要在中间的"match all"周围使用空格,使其非贪婪!):
(.{0,12}Mayweather.{0,12}).*?(.{0,12}undefeated.{0,12})
// "Floyd Mayweather Jr is an Am", "r currently undefeated as a profes"
可以进一步细化以在单词边界处停止(结果需要修剪):
(\b.{0,12}Mayweather.{0,12}\b).*?(\b.{0,12}undefeated.{0,12}\b)
// "Floyd Mayweather Jr is an ", " currently undefeated as a "
将此更改为输出固定数量的单词留作无聊的练习 reader。
编辑: 修复了最后两个版本中“.*”的贪婪(添加了“?”)。
您的代码的问题在于组的使用。
正则表达式组提供您首先尝试识别的字符串片段。
group(0),也写成group=整个字符串。
group(1) 是您的第一个匹配项 = "Mayweather".
的第一个实例
group(2) 是您的第二个匹配项 = "undefeated".
的第一个实例
您可以使用 start(int group) 和 end(int group) 方法 来找到匹配的索引,并且然后对新字符串执行一些基本的字符串操作。
如果您打算专门使用正则表达式,您的解决方案如下:
String sentence = ("Floyd Mayweather Jr is an American professional boxer " +
"currently undefeated as a professional and is a five-division world champion, " +
"having won ten world titles and the lineal championship in four different weight classes.");
/** Creates a StringBuilder, which can be altered,
* unlike a string, which is immutable. */
StringBuilder sb = new StringBuilder(sentence.length());
Pattern p = Pattern.compile("(Mayweather) .* (undefeated)");
Matcher m = p.matcher(sentence);
if (m.find()) {
int g1Start = m.start(1);
int g1End = m.end(1);
int g2Start = m.start(2);
int g2End = m.end(2);
sb.append(sentence.substring(0, g1Start));
sb.append("...");
sb.append(sentence.substring(g1End, g2Start));
sb.append("...");
sb.append(sentence.substring(g2End, (sentence.length() - 1)));
我不确定你是否需要在末尾使用换行符,但如果需要:
sb.append("\r\n");
那剩下的就简单了:
newText = sb.toString();
textView.setText(newText);
}
希望对您有所帮助:)
我有一句话,一套话说;梅威瑟,不败……等等。 我想:
- 检查句子是否包含上述任何单词……(我希望它只查找匹配的单词,基本上忽略句号、逗号和换行符。)
- 如果是这样,我想在每个匹配的单词前后显示几个单词,也许可以使用
String.format()
这是我的代码,它似乎工作正常但不完全是我想要的:
String sentence = "Floyd Mayweather Jr is an American professional boxer " +
"currently undefeated as a professional and is a five-division world champion, " +
"having won ten world titles and the lineal championship in four different weight classes.";
String newText = "";
Pattern p = Pattern.compile("(Mayweather) .* (undefeated)");
Matcher m = p.matcher(sentence);
if (m.find()) {
String group1 = m.group(1);
String group2 = m.group(2);
newText = String.format("%s ... %s" , group1, group2);
System.out.println(newText);
}
现在的输出是:
Mayweather ... undefeated
我想要的是这样的:
Floyd Mayweather Jr is an American ... currently undefeated as a professional ...
你能告诉我怎么做吗,或者指导我正确的方向,因为我被卡住了。
在此先感谢大家。
你可以试试下面的一种,
注意:这只是一个原型,所以不要直接复制粘贴
String str="Floyd Mayweather Jr is an American professional boxer currently undefeated as a professional and is a five-division world champion, having won ten world titles and the lineal championship in four different weight classes.";
int firstIndex=str.indexOf("American");
int secondIndex=str.indexOf("boxer");
String group1=str.substring(0,firstIndex+"American".length()); // gives you 1st group
String group2=str.substring(secondIndex);
String newText = String.format("%s ... %s" , group1, group2);
System.out.println(newText);
输出
Floyd Mayweather Jr is an American ... boxer currently undefeated as a professional and is a five-division world champion, having won ten world titles and the lineal championship in four different weight classes.
如果你真的想通过 RegEx 解决这个问题,你需要让你的捕获组匹配你想要输出的所有内容。目前它们仅匹配您的搜索字词:
(Mayweather) .* (undefeated)
// "Mayweather", "undefeated"
你可以尝试这样的事情(只使用一组!),但这会匹配你的整个例子:
(.*Mayweather.*undefeated.*)
// -whole text-
可以改成这样,再次匹配两部分,前后最多12个字符(不要在中间的"match all"周围使用空格,使其非贪婪!):
(.{0,12}Mayweather.{0,12}).*?(.{0,12}undefeated.{0,12})
// "Floyd Mayweather Jr is an Am", "r currently undefeated as a profes"
可以进一步细化以在单词边界处停止(结果需要修剪):
(\b.{0,12}Mayweather.{0,12}\b).*?(\b.{0,12}undefeated.{0,12}\b)
// "Floyd Mayweather Jr is an ", " currently undefeated as a "
将此更改为输出固定数量的单词留作无聊的练习 reader。
编辑: 修复了最后两个版本中“.*”的贪婪(添加了“?”)。
您的代码的问题在于组的使用。 正则表达式组提供您首先尝试识别的字符串片段。
group(0),也写成group=整个字符串。
group(1) 是您的第一个匹配项 = "Mayweather".
的第一个实例group(2) 是您的第二个匹配项 = "undefeated".
的第一个实例您可以使用 start(int group) 和 end(int group) 方法 来找到匹配的索引,并且然后对新字符串执行一些基本的字符串操作。
如果您打算专门使用正则表达式,您的解决方案如下:
String sentence = ("Floyd Mayweather Jr is an American professional boxer " +
"currently undefeated as a professional and is a five-division world champion, " +
"having won ten world titles and the lineal championship in four different weight classes.");
/** Creates a StringBuilder, which can be altered,
* unlike a string, which is immutable. */
StringBuilder sb = new StringBuilder(sentence.length());
Pattern p = Pattern.compile("(Mayweather) .* (undefeated)");
Matcher m = p.matcher(sentence);
if (m.find()) {
int g1Start = m.start(1);
int g1End = m.end(1);
int g2Start = m.start(2);
int g2End = m.end(2);
sb.append(sentence.substring(0, g1Start));
sb.append("...");
sb.append(sentence.substring(g1End, g2Start));
sb.append("...");
sb.append(sentence.substring(g2End, (sentence.length() - 1)));
我不确定你是否需要在末尾使用换行符,但如果需要:
sb.append("\r\n");
那剩下的就简单了:
newText = sb.toString();
textView.setText(newText);
}
希望对您有所帮助:)