读取 Java 中的大文件，速度太慢且超出了 gc 开销限制

Question

我有一个大文件（大约 3GB）并读入 ArrayList 当我运行下面的代码时，几分钟后代码运行非常缓慢并且 CPU 使用率很高。几分钟后，eclipse 控制台显示错误 java.lang.OutOfMemoryError：超出 GC 开销限制。

OS:windows2008R2,
4 杯，
32GB 内存
java版本“1.7.0_60”

eclipse.ini

-startup
plugins/org.eclipse.equinox.launcher_1.3.0.v20130327-1440.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.200.v20140116-2212
-product
org.eclipse.epp.package.standard.product
--launcher.defaultAction
openFile
#--launcher.XXMaxPermSize
#256M
-showsplash
org.eclipse.platform
#--launcher.XXMaxPermSize
#256m
--launcher.defaultAction
openFile
--launcher.appendVmargs
-vmargs
-Dosgi.requiredJavaVersion=1.6
-Xms10G
-Xmx10G
-XX:+UseParallelGC
-XX:ParallelGCThreads=24
-XX:MaxGCPauseMillis=1000
-XX:+UseAdaptiveSizePolicy

java代码：

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File("/words/wordlist.dat")));        
            InputStreamReader isr = new InputStreamReader(bis,"utf-8");
            BufferedReader in = new BufferedReader(isr,1024*1024*512);

            String strTemp = null;
            long ind = 0;

            while (((strTemp = in.readLine()) != null)) 
            {
                matcher.reset(strTemp);

                if(strTemp.contains("$"))
                {
                    al.add(strTemp);
                    strTemp = null;
                }
                ind = ind + 1;
                if(ind%100000==0)
                {
                    System.out.println(ind+"    100,000 +");
                }

            }
            in.close();

我的用例：

neural network
java
oracle
solaris
quick sort
apple
green fluorescent protein
acm
trs

Answer 1

writing a program in java to get statistics on how many times the keyword were found in the search word log list

我建议你这样做。创建一个地图，计算关键字出现的次数，或者计算所有单词的出现次数。

使用 Java 8 个流，您可以在一两行中完成此操作，而无需一次将整个文件加载到内存中。

try (Stream<String> s = Files.lines(Paths.get("filename"))) {
    Map<String, Long> count = s.flatMap(line -> Stream.of(line.trim().split(" +")))
            .collect(Collectors.groupingBy(w -> w, Collectors.counting()));
}

读取 Java 中的大文件，速度太慢且超出了 gc 开销限制

Read Big file in Java , too slow and gc overhead limit exceeded

java

performance

bufferedreader