如何优化我的代码,计算小于 200 万的所有总和

How to Optimise my code that computes the sum of all from less than 2 million

我从 Project Euler 开始尝试这个问题,我需要计算所有素数的总和,直到两百万。

这是我想出的解决方案 -

#include <stdio.h>

int main() {
    long sum = 5;  // Already counting 2 and 3 in my sum.
    int i = 5;     // Checking from 5 
    int count =  0;
    while (i <= 2000000) {
        count = 0;
        for (int j = 3; j <= i / 2; j += 2) {
            // Checking if i (starting from 5) is divisible from 3 
            if (i % j == 0) {   // to i/2 and only checking for odd values of j
                count = 1;
            }
        }
        if (count == 0) {
            sum += i;
        }
        i += 2;
    }
    printf("%ld ", sum);
}

运行 大约需要 480 秒,我想知道是否有更好的解决方案或提示来改进我的程序。

________________________________________________________
Executed in  480.95 secs    fish           external
   usr time  478.54 secs    0.23 millis  478.54 secs
   sys time    1.28 secs    6.78 millis    1.28 secs

通过两个小的修改,您的代码变得更快:

#include <stdio.h>
#include <math.h>

int main() {
  long long sum = 5;    // we need long long, long might not be enough
                        // depending on your platform
  int i = 5;
  int count = 0;
  while (i <= 2000000) {
    count = 0;

    int limit = sqrt(i);                  // determine upper limit once and for all
    for (int j = 3; j <= limit; j += 2) { // use upper limit sqrt(i) instead if i/2
        if (i % j == 0) {
          count = 1;
          break;                          // break out from loop as soon 
                                          // as number is not prime
        }
    }
    if (count == 0) {
      sum += i;
    }
    i += 2;
  }
  printf("%lld ", sum);                   // we need %lld for long long
}

所有的解释都在评论里

但肯定有更好甚至更快的方法来做到这一点。

我在我 10 岁的 MacPro 上 运行 这个 2000 万 第一次素数花了大约 30 秒。

这个程序近乎即时(即使在调试...)200万的总和,2000万只需要一秒钟(Windows10, 10 years-old i7 @ 3.4 GHz,MSVC 2019)。

注意:没有时间设置我的 C 编译器,这就是 malloc.

上有强制转换的原因

“大”优化是存储平方值AND素数,所以绝对没有测试不可能的除数。由于在给定的整数区间内不超过 1/10 的素数(启发式,健壮的代码应该测试它并在需要时重新分配 primes 数组),时间大大缩短。

#include <stdio.h>
#include <malloc.h>

#define LIMIT 2000000ul     // Computation limit.

typedef struct {
    unsigned long int   p ;     // Store a prime number.
    unsigned long int   sq ;    // and its square.
} prime ;

int main() {
    prime* primes = (prime*)malloc((LIMIT/10)*sizeof(*primes)) ;        // Store found primes. Can quite safely use 1/10th of the whole computation limit.
    unsigned long int primes_count=1 ;
    unsigned long int i = 3 ;
    unsigned long long int sum = 0 ;
    unsigned long int j = 0 ;
    int is_prime = 1 ;

    // Feed the first prime, 2.
    primes[0].p = 2 ;
    primes[0].sq = 4 ;
    sum = 2 ;
    // Parse all numbers up to LIMIT, ignoring even numbers.
    // Also reset the "is_prime" flag at each loop.
    for (i = 3 ; i <= LIMIT ; i+=2, is_prime = 1 ) {
        // Parse all previously found primes.
        for (j = 0; j < primes_count; j++) {
            // Above sqrt(i)? Break, i is a prime.
            if (i<primes[j].sq)
                break ;
            // Found a divisor? Not a prime (and break).
            if ((i % primes[j].p == 0)) {
                is_prime = 0 ;
                break ;
            }
        }
        // Add the prime and its square to the array "primes".
        if (is_prime) {
            primes[primes_count].p = i ;
            primes[primes_count++].sq = i*i ;
            // Compute the sum on-the-fly
            sum += i ;
        }
    }
    printf("Sum of all %lu primes: %llu\n", primes_count, sum);
    free(primes) ;
}

您的程序可以通过提前停止内部循环来轻松改进:

  • i 超过 sqrt(j) 时。
  • 找到除数时。

另请注意,类型 long 可能不足以满足所有架构的总和。 long long 推荐。

这是修改后的版本:

#include <stdio.h>

int main() {
    long long sum = 5;  // Already counting 2 and 3 in my sum.
    long i = 5;     // Checking from 5 
    while (i <= 2000000) {
        int count = 0;
        for (int j = 3; j * j <= i; j += 2) {
            // Checking if i (starting from 5) is divisible from 3 
            if (i % j == 0) {   // to i/2 and only checking for odd values of j
                count = 1;
                break;
            }
        }
        if (count == 0) {
            sum += i;
        }
        i += 2;
    }
    printf("%lld\n", sum);
}

这个简单的改变大大减少了运行时间!它比 1000 倍 快 2000000:

chqrlie> time ./primesum
142913828922

real    0m0.288s
user    0m0.264s
sys     0m0.004s

但是请注意,试验除法的效率远低于埃拉托色尼的经典筛法。

这是一个简单的版本:

#include <stdio.h>
#include <stdlib.h>

int main() {
    long max = 2000000;
    long long sum = 0;
    // Allocate an array of indicators initialized to 0
    unsigned char *composite = calloc(1, max + 1);
    // For all numbers up to sqrt(max)
    for (long i = 2; i * i <= max; i++) {
        // It the number is a prime
        if (composite[i] == 0) {
            // Set all multiples as composite. Multiples below the
            // square of i are skipped because they have already been
            // set as multiples of a smaller prime.
            for (long j = i * i; j <= max; j += i) {
                composite[j] = 1;
            }
        }
    }
    for (long i = 2; i <= max; i++) {
        if (composite[i] == 0)
            sum += i;
    }
    printf("%lld\n", sum);
    free(composite);
    return 0;
}

此代码是 2000000 的另一个 20 倍

chqrlie> time ./primesum-sieve
142913828922

real    0m0.014s
user    0m0.007s
sys     0m0.002s

对于更大的边界,可以通过多种方式进一步改进筛法。