是什么让我的代码使用这么多内存？

Question

我已经通过各种方式解决了CodeEval上的一个简单问题，具体可以参考here（只有几行）

我已经制作了 3 个工作版本（其中一个在 Scala 中），我不明白我最后一个 Java 版本的性能差异，我认为这是最好的时间 memory-wise.

我还将此与 Github 上的代码进行了比较。以下是 CodeEval 返回的性能统计数据：

。版本 1 是在 Github 上找到的版本 .版本 2 是我的 Scala 解决方案：

object Main extends App {
    val p = Pattern.compile("\d+")
    scala.io.Source.fromFile(args(0)).getLines
        .filter(!_.isEmpty)
        .map(line => {
            val dists = new TreeSet[Int]
            val m     = p.matcher(line)
            while (m.find) dists += m.group.toInt

            val list  = dists.toList
            list.zip(0 +: list).map { case (x,y) => x - y }.mkString(",")
        })
        .foreach(println)
}

。版本 3 是我的 Java 解决方案，我认为它是最好的 :

public class Main {
    public static void main(String[] args) throws IOException {
        Pattern        p    = Pattern.compile("\d+");
        File           file = new File(args[0]);
        BufferedReader br   = new BufferedReader(new FileReader(file));

        String line;
        while ((line = br.readLine()) != null) {
            Set<Integer> dists = new TreeSet<Integer>();
            Matcher      m     = p.matcher(line); 
            while (m.find()) dists.add(Integer.parseInt(m.group()));

            Iterator<Integer> it   = dists.iterator();
            int               prev = 0;
            StringBuilder     sb   = new StringBuilder();
            while (it.hasNext()) {
                int curr = it.next();
                sb.append(curr - prev);
                sb.append(it.hasNext() ? "," : "");
                prev = curr;
            }
            System.out.println(sb);
        }
        br.close();
    }
}

版本 4 与版本 3 相同，除了我不使用 StringBuilder 来打印输出并像版本 1

以下是我对这些结果的解释：

版本 1 太慢，因为 System.out.print 调用次数太多。此外，在非常大的行上使用 split（在执行的测试中就是这种情况）会占用大量内存。
版本 2 似乎也很慢，但这主要是因为 "overhead" on 运行ning Scala code on CodeEval，即使是非常高效的代码运行也很慢它
版本 2 使用不必要的内存从集合中构建列表，这也需要一些时间，但不应该太重要。编写更高效的 Scala 可能会喜欢用 Java 编写它，所以我更喜欢优雅而不是性能
我认为版本 3 不应该使用那么多内存。使用 StringBuilder 对内存的影响与在版本 2

mkString

版本 4 证明对 System.out.println 的调用正在减慢程序速度

有人看到对这些结果的解释吗？

Answer 1

你的 Scala 解决方案很慢，不是因为 "overhead on CodeEval"，而是因为你正在构建一个不可变的 TreeSet，一个一个地向它添加元素。将其替换为

val regex = """\d+""".r // in the beginning, instead of your Pattern.compile
...
.map { line => 
    val dists = regex.findAllIn(line).map(_.toInt).toIndexedSeq.sorted
...

应该可以减少大约 30-40% 的执行时间。

同样的方法（构建一个列表，然后排序）可能会帮助您在 "version 3" 中使用内存（java 集合是 真正的 内存消耗).给列表一个初始大小也是个好主意（否则，每次用完容量时它都会增长 50%，这在内存和性能上都是浪费）。 600 听起来是个不错的数字，因为这是问题描述中城市数量的上限。

现在，因为我们知道上限，所以更快更简洁的方法是取消列表和装箱整数，而只做 int dists[] = new int[600];。

如果你想变得非常花哨，你还可以使用描述中提到的 "route length" 范围。例如，不是将整数放入数组中并排序（或保留树集），而是创建一个 20,000 位的数组（为了速度甚至 20K 字节），并在读取时设置您在输入中看到的那些……将比您的任何解决方案都更快，内存效率更高。

Answer 2

我尝试解决这个问题，发现您不需要城市名称，只需要排序数组中的距离。

它有更好的 738ms 运行时间和 4513792 的内存。

尽管这可能无助于改进您的代码，但这似乎是解决问题的更好方法。欢迎任何进一步改进代码的建议。

import java.io.*;
import java.util.*;

public class Main {
public static void main (String[] args) throws IOException {
    File file = new File(args[0]);
    BufferedReader buffer = new BufferedReader(new FileReader(file));
    String line;
    while ((line = buffer.readLine()) != null) {
        line = line.trim();

        String out = new Main().getDistances(line); 

        System.out.println(out);
    }
}

public String getDistances(String s){

    //split the string
    String[] arr = s.split(";");

    //create an array to hold the distances as integers
    int[] distances = new int[arr.length];

    for(int i=0; i<arr.length; i++){
         //find the index of , - get the characters after that - convert to integer - add to distances array
        distances[i] = Integer.parseInt(arr[i].substring(arr[i].lastIndexOf(",")+1));    
    }

    //sort the array
    Arrays.sort(distances);

    String output = "";
    output += distances[0]; //append the distance to the closest city to the string

    for(int i=0; i<arr.length-1; i++){

        //get distance between current element(city) and next
        int distance_between = distances[i+1] - distances[i];

        //append the distance to the string
        output += "," + distance_between;
    }


    return output;   
 }
 }

Answer 3

我进行了一些测试。

每种语言都有一个基准。我在 java 和 java 脚本中编码。对于 java 脚本，这是我的测试结果：

修订版 1：JS 的默认空样板，带有标准输出的消息
修订版 2：与文件读取相同
修订版 3：只是向标准输出发送一条消息

可以看到不管怎样，至少会有200毫秒运行的时间和5兆左右的内存使用。这个基线也取决于服务器的负载！曾经有一段时间 codeevals 严重超载，因此无法运行在最大时间（10 秒）内做任何事情。

看看这个，这是一个与之前完全不同的挑战：

Rev4：我的解决方案
Rev5: 现在再次提交相同的代码。多获得8000排名积分。 :D

结论：我不会太担心 CPU 以及内存使用和排名。明显不靠谱

是什么让我的代码使用这么多内存？

What is making my code use so much memory?

java

memory

performance

scala