如何从段落中提取固定长度的单词？

Question

我正在尝试从 paragraph/string 中提取单词。我在很多地方搜索了它，但没有找到 relative material。我想从

中提取长度为 4 的单词

"I want to have alot of moneys when I am older probably e1X2"

我正在尝试使用

进行提取

List<String> words = new ArrayList<String>();
        String s  = "I want to have alot of moneys when I am older probably.";
        Pattern p = Pattern.compile("[a-zA-Z']{4,}");
        Matcher m = p.matcher(s);
        while (m.find()) {
            words.add(m.group());
        }

    System.out.println(words);

The output which am I getting right now

[want, have, alot, moneys, when, older, probably]

但输出必须是

[want, have, alot, when]

Answer 1

您想使用正则表达式吗？

因为你没有加上表示组的“()”，正如你所说的那样"m.group()"你需要这个语法。

在这里使用正则表达式：regex101。之后将其放入您的 Java 程序中。

您也可以用白色 space 拆分字符串，然后仅使用具有所需长度的元素过滤结果数组。

Answer 2

获得结果的更简单方法：

List<String> words=new ArrayList<String>(); 
    String s="I want to have alot of of moneys when I am older probably";
    String str[]=s.split(" ");
    for(int i=0;i<str.length;i++)
    {
        if(str[i].length()==4)
            words.add(str[i]);
    }
    System.out.print(words);

Answer 3

尝试：

public static void main(String[] args) {

        List<String> words = new ArrayList<String>();
        String s  = "I want to have alot of moneys when I am older probably.";
        Pattern p = Pattern.compile("\b\w{4}\b");
        Matcher m = p.matcher(s);
        while (m.find()) {
            words.add(m.group());
        }

        System.out.println(words);
    }

输出： [want, have, alot, when]

解释：

\b 匹配单词边界。

Answer 4

您需要在正则表达式中回顾过去并展望未来

你原来的：

    Pattern p = Pattern.compile("[a-zA-Z']{4,}");

向前看和向后看：

    Pattern p = Pattern.compile("(?=\s)[a-zA-Z']{4,}(?=\s)");

既然添加了先行和后行，可能会出现字符串开头和结尾不匹配的问题。在匹配字符串的两边添加一个space，它应该可以工作

Answer 5

使用 stream API

的解决方案

/* Required imports:
 * import java.util.Arrays;
 * import java.util.List;
 * import java.util.stream.Collectors;
 */
List<String> words = Arrays.stream(text.split("\b"))
                           .filter(word -> word.length() == 4)
                           .collect(Collectors.toList());

文本被拆分成单独的单词。
只有长度为 4 的单词才能通过过滤器。
所有四个字母的单词都收集到一个列表中。

如何从段落中提取固定长度的单词？

How to extract words of constant length form a paragraph?

java

android

text-extraction

substring

stringtokenizer