使用 bufferReader 将文本拆分为单词
splitting a text into words using bufferReader
我有一个问题正在解决。我必须使用 bufferedReader 将单词添加到树集中(并输出树集的大小),但问题是我无法通过编译器速度测试限制。文本仅包含字母和空格(可以是空行)。我必须找到一个新的解决方案,但似乎不是这个:
BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
words = line.split("\s+");
for (int i = 0; i < words.length && words[0].length() > 0; i++) {
text.add(words[i]);
}
}
System.out.println(text.size());
是否有任何其他“拆分”方法可以使用,以便编译器使用较少的“时间思考”?
根据您提供的假设,我会简单地将所有内容添加到集合中,最后从中删除不需要的值。这有望减少检查条件的时间(实际上并不多)
BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
words = line.split("\s+");
for(String value: words) {
text.add(value);
}
}
text.remove(" ");
text.remove("");
text.remove(null);
System.out.println(text.size());
排队
words = line.split("\s+");
你按正则表达式拆分,这比按一个字符拆分(在我的机器上 5 次)要慢得多。
Java split String performances
如果单词只被一个单词隔开space,那么解决方法很简单
words = line.split(" ");
只需替换为这一行,您的代码就会 运行 更快。
如果单词之间可以隔几个space,那么在循环后面加这样一行
text.remove("");
并且仍然用 1 个字符拆分替换您的正则表达式拆分。
public class Test {
public static void main(String[] args) throws IOException {
// string contains 1, 2 and two spaces between 1 and 2. text size should be 2
String txt = "1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1";
InputStream inpstr = new ByteArrayInputStream(txt.getBytes());
BufferedReader read = new BufferedReader(new InputStreamReader(inpstr));
Set<String> text = new TreeSet<>();
String[] words;
String line;
long startTime = System.nanoTime();
while ((line = read.readLine()) != null) {
//words = line.split("\s+"); -- runs 5 times slower
words = line.split(" ");
for (int i = 0; i < words.length; i++) {
text.add(words[i]);
}
}
text.remove(""); // add only if words can be separated with multiple spaces
long endTime = System.nanoTime();
System.out.println((endTime - startTime) + " " + text.size());
}
}
您也可以将 for loop
替换为
text.addAll(Arrays.asList(words));
您当然可以将 BufferedReader
流式传输到 TreeSet
:
Collection<String> c = read.lines().flatMap(line -> Stream.of(line.split("\s+")).filter(word -> word.length() > 0)).collect(Collectors.toCollection(TreeSet::new));
我有一个问题正在解决。我必须使用 bufferedReader 将单词添加到树集中(并输出树集的大小),但问题是我无法通过编译器速度测试限制。文本仅包含字母和空格(可以是空行)。我必须找到一个新的解决方案,但似乎不是这个:
BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
words = line.split("\s+");
for (int i = 0; i < words.length && words[0].length() > 0; i++) {
text.add(words[i]);
}
}
System.out.println(text.size());
是否有任何其他“拆分”方法可以使用,以便编译器使用较少的“时间思考”?
根据您提供的假设,我会简单地将所有内容添加到集合中,最后从中删除不需要的值。这有望减少检查条件的时间(实际上并不多)
BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
words = line.split("\s+");
for(String value: words) {
text.add(value);
}
}
text.remove(" ");
text.remove("");
text.remove(null);
System.out.println(text.size());
排队
words = line.split("\s+");
你按正则表达式拆分,这比按一个字符拆分(在我的机器上 5 次)要慢得多。 Java split String performances
如果单词只被一个单词隔开space,那么解决方法很简单
words = line.split(" ");
只需替换为这一行,您的代码就会 运行 更快。
如果单词之间可以隔几个space,那么在循环后面加这样一行
text.remove("");
并且仍然用 1 个字符拆分替换您的正则表达式拆分。
public class Test {
public static void main(String[] args) throws IOException {
// string contains 1, 2 and two spaces between 1 and 2. text size should be 2
String txt = "1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1";
InputStream inpstr = new ByteArrayInputStream(txt.getBytes());
BufferedReader read = new BufferedReader(new InputStreamReader(inpstr));
Set<String> text = new TreeSet<>();
String[] words;
String line;
long startTime = System.nanoTime();
while ((line = read.readLine()) != null) {
//words = line.split("\s+"); -- runs 5 times slower
words = line.split(" ");
for (int i = 0; i < words.length; i++) {
text.add(words[i]);
}
}
text.remove(""); // add only if words can be separated with multiple spaces
long endTime = System.nanoTime();
System.out.println((endTime - startTime) + " " + text.size());
}
}
您也可以将 for loop
替换为
text.addAll(Arrays.asList(words));
您当然可以将 BufferedReader
流式传输到 TreeSet
:
Collection<String> c = read.lines().flatMap(line -> Stream.of(line.split("\s+")).filter(word -> word.length() > 0)).collect(Collectors.toCollection(TreeSet::new));