为什么插入排序 运行 在时间上因不同的实现而有如此大的不同

Why does insertion-sort run so much differently time-wise with different implementation

我正在对基于数组的不同长度实现的插入排序算法进行计时。

我知道插入排序的平均情况为 O(n^2),所以我意识到当您尝试对大型数组进行排序时会花费一些时间,但为什么底部的两个实现之间会有近 6500 毫秒的差异对于包含 100,000 个条目的数组?

这是我的数组的设置,其中填充了 1-1 百万的随机整数

int[] oneHundredThousand = new int[100000];
Random r = new Random();//RANDOMIZER

for(int i = 0; i < oneHundredThousand.length; i++)
        oneHundredThousand[i] = r.nextInt(1000000) + 1; //1 - 1000000

这是我运行测试的2种方法,使用了插入排序

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {         
        int cur = intArray[k];                
        int j = k;                          
        while (j > 0 && intArray[j-1] > cur) {  
            intArray[j] = intArray[j-1];              
            j--;                              
        }
        intArray[j] = cur;                      
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

public static long insertionSort2(int[] input){
    long startTime = System.currentTimeMillis();
    int temp;
    for (int i = 1; i < input.length; i++) {
        for(int j = i ; j > 0 ; j--){
            if(input[j] < input[j-1]){
                temp = input[j];
                input[j] = input[j-1];
                input[j-1] = temp;
            }
        }
    }
    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

现在像这样在 main 中调用这些方法(复制数组以便通过每个 'sort' 保留原始顺序),我得到了注释结果,为什么会有如此大的不同?我不觉得同一个算法的不同实现应该效率低得多。

int[] copy100_1 = Arrays.copyOf(oneHundredThousand, oneHundredThousand.length);
int[] copy100_2 = Arrays.copyOf(oneHundredThousand, oneHundredThousand.length);

//816ms 
System.out.print(insertionSort1(copy100_1));
//7400ms
System.out.print(insertionSort2(copy100_2));

分析插入排序,发现最好的情况。执行时间为 O(n²)。让我们以您的第一个实施为例来了解原因。

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {         
        int cur = intArray[k];                
        int j = k;

        while (j > 0 && intArray[j-1] > cur) { // This loop can break early due 
                                               // to intArray[j-1] > cur
            intArray[j] = intArray[j-1];              
            j--;                              
        }
        intArray[j] = cur;                      
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

这意味着对于(部分)排序的数组,(这部分)的执行时间减少到 O(n)

您的第二个实现缺少这样的提前中断条件。但是你可以很简单地添加它:

public static void insertionSort2(int[] input) {
    int temp;
    for (int i = 1; i < input.length; i++) {
        for (int j = i; j > 0; j--) {
            if (input[j] < input[j - 1]) {
                temp = input[j];
                input[j] = input[j - 1];
                input[j - 1] = temp;
            } else {
                break; // this is the "early break" missing.
            }
        }
    }
}

我以为模板,重新实现了所有三个版本。

package benchmark;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;

@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Test {

    @State(Scope.Benchmark)
    public static class Input {

        public static final Random rng = new Random();
        final int[] array;
        final int[] expected;

        public Input() {
            final Random r = new Random();
            this.array = new int[200_000];
            for (int i = 0; i < this.array.length; i++) {
                this.array[i] = i;
            }
            this.expected = Arrays.copyOf(this.array, this.array.length);

            // Fisher-Yates shuffle
            for (int i = this.array.length - 1; i > 0; --i) {
                int swap = Input.rng.nextInt(i);
                int tmp = this.array[swap];
                this.array[swap] = this.array[i];
                this.array[i] = tmp;
            }
        }
    }

    @Benchmark
    public void benchSort1(final Input in) {
        insertionSort1(in.array);
    }

    @Benchmark
    public void benchSort2(final Input in) {
        insertionSort2(in.array);
    }

    @Benchmark
    public void benchSort3(final Input in) {
        insertionSort3(in.array);
    }

    public static void insertionSort1(int[] intArray) {
        int n = intArray.length;
        for (int k = 1; k < n; k++) {
            int cur = intArray[k];
            int j = k;
            while (j > 0 && intArray[j - 1] > cur) {
                intArray[j] = intArray[j - 1];
                j--;
            }
            intArray[j] = cur;
        }
    }

    public static void insertionSort2(int[] input) {
        int temp;
        for (int i = 1; i < input.length; i++) {
            for (int j = i; j > 0; j--) {
                if (input[j] < input[j - 1]) {
                    temp = input[j];
                    input[j] = input[j - 1];
                    input[j - 1] = temp;
                }
            }
        }
    }

    public static void insertionSort3(int[] input) {
        int temp;
        for (int i = 1; i < input.length; i++) {
            for (int j = i; j > 0; j--) {
                if (input[j] < input[j - 1]) {
                    temp = input[j];
                    input[j] = input[j - 1];
                    input[j - 1] = temp;
                } else {
                    break;
                }
            }
        }
    }

    public static void main(String[] arg) throws Exception {
        Options option = new OptionsBuilder().include(Test.class.getSimpleName()).build();
        new Runner(option).run();
    }
}

这些是最终结果,benchSort1benchSort2 是您的原始版本,benchSort3 "corrected" 双 for 版本:

Benchmark        Mode  Cnt      Score   Error  Units
Test.benchSort1  avgt    2      0.307          ms/op
Test.benchSort2  avgt    2  15145.611          ms/op
Test.benchSort3  avgt    2      0.210          ms/op

如您所见,两个版本现在在时间上非常接近。

为了完成上述答案,我想告诉您,为了正确比较两种算法的性能,您应该考虑使用真正的基准测试框架,例如 JMH

那么,这是为什么呢?使用简单的 main,你的程序太依赖于你的计算机状态:如果在你的第二次排序操作中,你的机器开始交换,或者另一个程序消耗大量处理能力,它会降低你的 java 应用程序性能,所以你对第二种算法的测量。另外,您无法控制 JIT 优化。

JMH 尝试通过多次重复您的测试来提供更安全的措施,启动阶段在测量之前利用 JIT 等。

这里是 class 对算法进行基准测试的示例:

// Those annotations control benchmark configuration. More info on JMH doc
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class SortingBenchmark {

/**
 * We create objects in charge of preparing data required by our benchmarks.
 * It is created before benchmark starts, so array initialization phase does
 * not pollute measures.
 */
@State(Scope.Benchmark)
public static class Input {

    final int[] array;

    public Input() {
        final Random r = new Random();
        array = new int[100000];
        for (int i = 0; i < array.length; i++) {
            array[i] = r.nextInt(1000000) + 1;
        }
    }
}

/**
 * Test first sorting method
 * @param in
 */
@Benchmark
public void benchSort1(final Input in) {
    insertionSort1(in.array);
}

/**
 * Test second sorting method
 * @param in
 */
@Benchmark
public void benchSort2(final Input in) {
    insertionSort2(in.array);
}

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {
        int cur = intArray[k];
        int j = k;
        while (j > 0 && intArray[j - 1] > cur) {
            intArray[j] = intArray[j - 1];
            j--;
        }
        intArray[j] = cur;
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

public static long insertionSort2(int[] input) {
    long startTime = System.currentTimeMillis();
    int temp;
    for (int i = 1; i < input.length; i++) {
        for (int j = i; j > 0; j--) {
            if (input[j] < input[j - 1]) {
                temp = input[j];
                input[j] = input[j - 1];
                input[j - 1] = temp;
            }
        }
    }
    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

/**
 * That's JMH boilerplate to launch the benchmark.
 * @param arg
 * @throws Exception
 */
public static void main(String[] arg) throws Exception {
    Options option = new OptionsBuilder().include(SortingBenchmark.class.getSimpleName()).build();
    new Runner(option).run();
}

这是结果(JMH 的最终结论,以避免污染线程):

# Run complete. Total time: 00:01:51

Benchmark                    Mode  Cnt     Score    Error  Units
SortingBenchmark.benchSort1  avgt    5     0,190 ±  0,016  ms/op
SortingBenchmark.benchSort2  avgt    5  1957,455 ± 73,014  ms/op

希望对你以后的测试有所帮助(注:这里有一个帮助我学习JMH的小教程:JMH tuto)。