确定字符串是否具有内部单词边界

Determine if a string has inner word boundaries

我使用以下 g 来确定单词是否出现在文本中,强制单词边界:

 if ( Pattern.matches(".*\b" + key + "\b.*", text) ) {
    //matched
 }

这会在 text-book 上匹配 book,但不会在 facebook 上匹配。

现在,我想做相反的事情:确定输入文本内部是否有单词边界

例如mutually-collaborative(正确,里面有单词边界)和 mutuallycollaborative(错误,因为里面没有单词边界)。

如果边界是 标点符号,这将起作用:

if( Pattern.matches("\p{Punct}", text) ) { //check punctuations
        //has punctuation
}

我想检查一般的单词边界,例如'-'等

有什么想法吗?

您想检查给定的字符串是否在字符串内部 包含单词边界。请注意 \b 匹配非空字符串的开头和结尾。因此,您需要排除这些替代方案。只需使用

"(?U)(?:\W\w|\w\W)"

这样,您将确保字符串包含单词和非单词字符的组合。

IDEONE demo:

String s = "mutuallyexclusive";
Pattern pattern = Pattern.compile("(?U)(?:\W\w|\w\W)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
    System.out.println(matcher.group() + " word boundary found!"); 
} else {
    System.out.println("Word boundary NOT found in " + s);  
}

只是一些关于 word boundary 可以匹配的内容的参考:

There are three different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

因此,\w\W|\W\w,我们排除了前两种情况。