如何保存所有令牌?

How to save all tokens?

我有一条短信。我把它分成句子和单词。接下来我必须将它拆分为 tokens(,,.,?,!, ...) 我在这里遇到了麻烦。你能告诉我选择哪个正则表达式吗?

这是我将文本拆分为句子和单词的代码。

String s = ReadFromFile();
String sentences[] = s.split("[.!?]\s*");
String words[][] = new String[sentences.length][]; 
for (int i = 0; i < sentences.length; ++i)
{
    words[i] = sentences[i].split("[\p{Punct}\s]+");
}
System.out.println(Arrays.deepToString(words));

所以,我有一个单独的句子数组和单词数组。但是对于令牌我有一个问题。

输入数据

Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators: Assume integer variable A holds 10 and variable B holds 20, then:

预期结果

. : , :

最简单的解决方案是不使用 split,这需要您描述您不想要的结果,而是使用 Matcher#find 并描述您想要找到的东西。

String s = "Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators: Assume integer variable A holds 10 and variable B holds 20, then:";

Pattern p = Pattern.compile("\p{Punct}");
       //or Pattern.compile("[.]{3}|\p{Punct}"); if you want to find "..."
Matcher m = p.matcher(s);
while (m.find()) {
    System.out.println(m.group());
}

输出:

.
:
,
:

不用打印 m.group() 你可以像 List 一样把它存储在集合中。