读取 Java 中的大文件,速度太慢且超出了 gc 开销限制

Read Big file in Java , too slow and gc overhead limit exceeded

我有一个大文件(大约 3GB)并读入 ArrayList 当我 运行 下面的代码时,几分钟后代码 运行 非常缓慢并且 CPU 使用率很高。 几分钟后,eclipse 控制台显示错误 java.lang.OutOfMemoryError:超出 GC 开销限制。

eclipse.ini

-startup
plugins/org.eclipse.equinox.launcher_1.3.0.v20130327-1440.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.200.v20140116-2212
-product
org.eclipse.epp.package.standard.product
--launcher.defaultAction
openFile
#--launcher.XXMaxPermSize
#256M
-showsplash
org.eclipse.platform
#--launcher.XXMaxPermSize
#256m
--launcher.defaultAction
openFile
--launcher.appendVmargs
-vmargs
-Dosgi.requiredJavaVersion=1.6
-Xms10G
-Xmx10G
-XX:+UseParallelGC
-XX:ParallelGCThreads=24
-XX:MaxGCPauseMillis=1000
-XX:+UseAdaptiveSizePolicy

java代码:

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File("/words/wordlist.dat")));        
            InputStreamReader isr = new InputStreamReader(bis,"utf-8");
            BufferedReader in = new BufferedReader(isr,1024*1024*512);

            String strTemp = null;
            long ind = 0;

            while (((strTemp = in.readLine()) != null)) 
            {
                matcher.reset(strTemp);

                if(strTemp.contains("$"))
                {
                    al.add(strTemp);
                    strTemp = null;
                }
                ind = ind + 1;
                if(ind%100000==0)
                {
                    System.out.println(ind+"    100,000 +");
                }

            }
            in.close();

我的用例:

neural network
java
oracle
solaris
quick sort
apple
green fluorescent protein
acm
trs

writing a program in java to get statistics on how many times the keyword were found in the search word log list

我建议你这样做。创建一个地图,计算关键字出现的次数,或者计算所有单词的出现次数。

使用 Java 8 个流,您可以在一两行中完成此操作,而无需一次将整个文件加载到内存中。

try (Stream<String> s = Files.lines(Paths.get("filename"))) {
    Map<String, Long> count = s.flatMap(line -> Stream.of(line.trim().split(" +")))
            .collect(Collectors.groupingBy(w -> w, Collectors.counting()));
}