确定字符串是否具有内部单词边界
Determine if a string has inner word boundaries
我使用以下 g 来确定单词是否出现在文本中,强制单词边界:
if ( Pattern.matches(".*\b" + key + "\b.*", text) ) {
//matched
}
这会在 text-book 上匹配 book,但不会在 facebook 上匹配。
现在,我想做相反的事情:确定输入文本内部是否有单词边界。
例如mutually-collaborative
(正确,里面有单词边界)和 mutuallycollaborative
(错误,因为里面没有单词边界)。
如果边界是 标点符号,这将起作用:
if( Pattern.matches("\p{Punct}", text) ) { //check punctuations
//has punctuation
}
我想检查一般的单词边界,例如'-'等
有什么想法吗?
您想检查给定的字符串是否在字符串内部 包含单词边界。请注意 \b
匹配非空字符串的开头和结尾。因此,您需要排除这些替代方案。只需使用
"(?U)(?:\W\w|\w\W)"
这样,您将确保字符串包含单词和非单词字符的组合。
String s = "mutuallyexclusive";
Pattern pattern = Pattern.compile("(?U)(?:\W\w|\w\W)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group() + " word boundary found!");
} else {
System.out.println("Word boundary NOT found in " + s);
}
只是一些关于 word boundary 可以匹配的内容的参考:
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
因此,\w\W|\W\w
,我们排除了前两种情况。
我使用以下 g 来确定单词是否出现在文本中,强制单词边界:
if ( Pattern.matches(".*\b" + key + "\b.*", text) ) {
//matched
}
这会在 text-book 上匹配 book,但不会在 facebook 上匹配。
现在,我想做相反的事情:确定输入文本内部是否有单词边界。
例如mutually-collaborative
(正确,里面有单词边界)和 mutuallycollaborative
(错误,因为里面没有单词边界)。
如果边界是 标点符号,这将起作用:
if( Pattern.matches("\p{Punct}", text) ) { //check punctuations
//has punctuation
}
我想检查一般的单词边界,例如'-'等
有什么想法吗?
您想检查给定的字符串是否在字符串内部 包含单词边界。请注意 \b
匹配非空字符串的开头和结尾。因此,您需要排除这些替代方案。只需使用
"(?U)(?:\W\w|\w\W)"
这样,您将确保字符串包含单词和非单词字符的组合。
String s = "mutuallyexclusive";
Pattern pattern = Pattern.compile("(?U)(?:\W\w|\w\W)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group() + " word boundary found!");
} else {
System.out.println("Word boundary NOT found in " + s);
}
只是一些关于 word boundary 可以匹配的内容的参考:
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
因此,\w\W|\W\w
,我们排除了前两种情况。