如何保存所有令牌?
How to save all tokens?
我有一条短信。我把它分成句子和单词。接下来我必须将它拆分为 tokens(,
,.
,?
,!
, ...) 我在这里遇到了麻烦。你能告诉我选择哪个正则表达式吗?
这是我将文本拆分为句子和单词的代码。
String s = ReadFromFile();
String sentences[] = s.split("[.!?]\s*");
String words[][] = new String[sentences.length][];
for (int i = 0; i < sentences.length; ++i)
{
words[i] = sentences[i].split("[\p{Punct}\s]+");
}
System.out.println(Arrays.deepToString(words));
所以,我有一个单独的句子数组和单词数组。但是对于令牌我有一个问题。
输入数据
Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators:
Assume integer variable A holds 10 and variable B holds 20, then:
预期结果
. : , :
最简单的解决方案是不使用 split
,这需要您描述您不想要的结果,而是使用 Matcher#find
并描述您想要找到的东西。
String s = "Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators: Assume integer variable A holds 10 and variable B holds 20, then:";
Pattern p = Pattern.compile("\p{Punct}");
//or Pattern.compile("[.]{3}|\p{Punct}"); if you want to find "..."
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
输出:
.
:
,
:
不用打印 m.group()
你可以像 List 一样把它存储在集合中。
我有一条短信。我把它分成句子和单词。接下来我必须将它拆分为 tokens(,
,.
,?
,!
, ...) 我在这里遇到了麻烦。你能告诉我选择哪个正则表达式吗?
这是我将文本拆分为句子和单词的代码。
String s = ReadFromFile();
String sentences[] = s.split("[.!?]\s*");
String words[][] = new String[sentences.length][];
for (int i = 0; i < sentences.length; ++i)
{
words[i] = sentences[i].split("[\p{Punct}\s]+");
}
System.out.println(Arrays.deepToString(words));
所以,我有一个单独的句子数组和单词数组。但是对于令牌我有一个问题。
输入数据
Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators: Assume integer variable A holds 10 and variable B holds 20, then:
预期结果
. : , :
最简单的解决方案是不使用 split
,这需要您描述您不想要的结果,而是使用 Matcher#find
并描述您想要找到的东西。
String s = "Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators: Assume integer variable A holds 10 and variable B holds 20, then:";
Pattern p = Pattern.compile("\p{Punct}");
//or Pattern.compile("[.]{3}|\p{Punct}"); if you want to find "..."
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
输出:
.
:
,
:
不用打印 m.group()
你可以像 List 一样把它存储在集合中。