为什么欧氏距离函数在 c 中比在 java 中慢?

Why is the euclidean distance function slower in c than in java?

我使用以下代码在 c 和 java 中实现并标记了以下函数。对于 c,我得到大约 1.688852 秒,而对于 java,它只需要 0.355038 秒。即使我删除 sqrt 函数,手动内联代码或更改函数签名以接受 6 double 坐标(以避免通过指针访问),因为 c 时间流逝也没有太大改善。

我正在像 cc -O2 main.c -lm 一样编译 c 程序。对于 java,我 运行 intellij idea 中的应用程序具有默认的 jvm 选项(java 8,openjdk)。

c:

#include <math.h>
#include <stdio.h>
#include <time.h>

typedef struct point3d
{
  double x;
  double y;
  double z;
} point3d_t;

double distance(point3d_t *from, point3d_t *to);

int main(int argc, char const *argv[])
{
  point3d_t from = {.x = 2.3, .y = 3.45, .z = 4.56};
  point3d_t to = {.x = 5.678, .y = 3.45, .z = -9.0781};

  double time = 0.0;
  int count = 10000000;

  for (size_t i = 0; i < count; i++)
  {
    clock_t tic = clock();
    double d = distance(&from, &to);
    clock_t toc = clock();
    time += ((double) (toc - tic) / CLOCKS_PER_SEC);
  }

  printf("Elapsed: %f seconds\n", time);

  return 0;
}

double distance(point3d_t *from, point3d_t *to)
{
  double dx = to->x - from->x;
  double dy = to->y - from->y;
  double dz = to->z - from->z;

  double d2 = (dx * dx) + (dy * dy) + (dz + dz);
  return sqrt(d2);
}

java:

public class App 
{
    static Random rnd = new Random();

    public static void main( String[] args )
    {
        var sw = new StopWatch();
        var time = 0.0;
        var count = 10000000;

        for (int i = 0; i < count; i++) {
            var from = Vector3D.of(rnd.nextDouble(), rnd.nextDouble(), rnd.nextDouble());
            var to = Vector3D.of(rnd.nextDouble(), rnd.nextDouble(), rnd.nextDouble());

            sw.start();
            var dist = distance(from, to);
            sw.stop();
            time += sw.getTime(TimeUnit.NANOSECONDS);
            sw.reset();
        }

        System.out.printf("Time: %f seconds\n", time / 1e09);
    }

    public static double distance(Vector3D from, Vector3D to) {
        var dx = to.getX() - from.getX();
        var dy = to.getY() - from.getY();
        var dz = to.getZ() - from.getZ();

        return Math.sqrt((dx * dx) + (dy * dy) + (dz * dz));
    }
}

我的objective是为了理解为什么c程序比java程序慢,让它工作得更快。

编辑:我在 java 程序中使用随机值来尝试确保 jvm 不会做任何有趣的事情,比如缓存结果和完全回避计算。

编辑:更新 c 的两个片段以使用 clock_gettime(),记录所有循环而不是方法调用所花费的时间,并且不丢弃方法调用的结果:

#include <math.h>
#include <stdio.h>
#include <time.h>

typedef struct point3d
{
  double x;
  double y;
  double z;
} point3d_t;

double distance(point3d_t *from, point3d_t *to);

int main(int argc, char const *argv[])
{
  point3d_t from = {.x = 2.3, .y = 3.45, .z = 4.56};
  point3d_t to = {.x = 5.678, .y = 3.45, .z = -9.0781};

  struct timespec fs;
  struct timespec ts;

  long time = 0;
  int count = 10000000;
  double dist = 0;

  clock_gettime(CLOCK_REALTIME, &fs);

  for (size_t i = 0; i < count; i++)
  {
    dist = distance(&from, &to);
  }

  clock_gettime(CLOCK_REALTIME, &ts);
  time = ts.tv_nsec - fs.tv_nsec;

  if (dist == 0.001)
  {
    printf("hello\n");
  }

  printf("Elapsed: %f sec\n", (double) time / 1e9);

  return 0;
}

double distance(point3d_t *from, point3d_t *to)
{
  double dx = to->x - from->x;
  double dy = to->y - from->y;
  double dz = to->z - from->z;

  double d2 = (dx * dx) + (dy * dy) + (dz + dz);
  return sqrt(d2);
}

java:

public class App 
{
    static Random rnd = new Random();

    public static void main( String[] args )
    {
        var from = Vector3D.of(rnd.nextDouble(), rnd.nextDouble(), rnd.nextDouble());
        var to = Vector3D.of(rnd.nextDouble(), rnd.nextDouble(), rnd.nextDouble());

        var time = 0.0;
        var count = 10000000;
        double dist = 0.0;

        var start = System.nanoTime();
        for (int i = 0; i < count; i++) {
            dist = distance(from, to);
        }

        var end = System.nanoTime();
        time = end - start;

        if (dist == rnd.nextDouble()) {
            System.out.printf("hello! %f\n", dist);
        }

        dist = dist + 1;
        System.out.printf("Time: %f sec\n", (double) time / 1e9);
        System.out.printf("Yohoo! %f\n", dist);
    }

    public static double distance(Vector3D from, Vector3D to) {
        var dx= to.getX() - from.getX();
        var dy = to.getY() - from.getY();
        var dz = to.getZ() - from.getZ();

        return Math.sqrt((dx * dx) + (dy * dy) + (dz * dz));
    }
}

使用 gcc -Wall -std=gnu99 -O2 main.c -lm 编译 c 代码。现在的结果对于 c 代码是 0.06323 秒,对于 java.

是 0.006325 秒

编辑:正如 Jérôme Richard 和 Peter Cordes 指出的那样,我的基准测试有缺陷,更不用说我在 c 版本中采用了负数的平方。因此,一旦我用 -fno-math-errno 编译了 c 程序,它的时钟就为 0 秒。我编译了像 gcc -O2 -std=gnu99 main.c -lm 这样的 c 程序。现在 c 程序有效计时为零秒 (271 ns),而 java 计时为 0.007217 秒。一切都井井有条:)

最终代码如下:

c:

#include <math.h>
#include <stdio.h>
#include <time.h>

typedef struct point3d
{
  double x;
  double y;
  double z;
} point3d_t;

double distance(point3d_t *from, point3d_t *to);

int main(int argc, char const *argv[])
{
  point3d_t from = {.x = 2.3, .y = 3.45, .z = 4.56};
  point3d_t to = {.x = 5.678, .y = 3.45, .z = -9.0781};

  struct timespec fs;
  struct timespec ts;

  long time = 0;
  int count = 10000000;
  double dist = 0;

  clock_gettime(CLOCK_REALTIME, &fs);

  for (size_t i = 0; i < count; i++)
  {
    dist = distance(&from, &to);
  }

  clock_gettime(CLOCK_REALTIME, &ts);
  time = ts.tv_nsec - fs.tv_nsec;

  printf("hello %f \n", dist);
  printf("Elapsed: %f ns\n", (double) time);
  printf("Elapsed: %f sec\n", (double) time / 1e9);

  return 0;
}

double distance(point3d_t *from, point3d_t *to)
{
  double dx = (to->x) - (from->x);
  double dy = (to->y) - (from->y);
  double dz = (to->z) - (from->z);

  double d2 = (dx * dx) + (dy * dy) + (dz * dz);
  return sqrt(d2);
}

java:

public class App 
{
    static Random rnd = new Random();

    public static void main( String[] args )
    {
        var from = Vector3D.of(2.3, 3.45, 4.56);
        var to = Vector3D.of(5.678, 3.45, -9.0781);

        var time = 0.0;
        var count = 10000000;
        double dist = 0.0;

        var start = System.nanoTime();
        for (int i = 0; i < count; i++) {
            dist = distance(from, to);
        }

        var end = System.nanoTime();
        time = end - start;

        System.out.printf("Yohoo! %f\n", dist);
        System.out.printf("Time: %f ns\n", (double) time / 1e9);
    }

    public static double distance(Vector3D from, Vector3D to) {
        var dx = to.getX() - from.getX();
        var dy = to.getY() - from.getY();
        var dz = to.getZ() - from.getZ();

        var d2 =  (dx * dx) + (dy * dy) + (dz * dz);
        return Math.sqrt(d2);
    }
}

首先,用于测量时间的方法非常不精确。当前的方法引入了可能比测量本身更大的巨大偏差。事实上,clock 在许多平台上都不是很精确(在我的机器上大约 1 毫秒,在几乎所有平台上通常不超过 1 微秒)。此外,不精确性被 10,000,000 次迭代大大放大。如果要精确测量循环,则需要将时钟调用移到循环外(如果可能use a more accurate measurement function)。

仍然,主要问题是函数结果未被使用,Java JIT 可以看到并部分优化它。 GCC 不能因为数学函数标准行为(errno 导致副作用在 Java 代码中不可用)。您可以使用命令行标志 禁用 errno 的使用 -fno-math-errno。有了那个 GCC 现在可以完全优化函数(即删除函数调用)并且由此产生的时间要小得多。但是,基准测试存在缺陷,您可能不想对其进行测量。如果你想写一个正确的基准,你需要读取计算值。例如,你可以计算一个校验和,至少要检查结果是correct/equivalent.