大浮点和的精度
The precision of a large floating point sum
我正在尝试对已排序的正递减浮点数数组求和。我已经看到,对它们求和的最好方法是开始将数字从最低到最高相加。我写这段代码是为了举个例子,但是,从最高数字开始的总和更精确。为什么? (当然,和1/k^2应该是f=1.644934066848226)。
#include <stdio.h>
#include <math.h>
int main() {
double sum = 0;
int n;
int e = 0;
double r = 0;
double f = 1.644934066848226;
double x, y, c, b;
double sum2 = 0;
printf("introduce n\n");
scanf("%d", &n);
double terms[n];
y = 1;
while (e < n) {
x = 1 / ((y) * (y));
terms[e] = x;
sum = sum + x;
y++;
e++;
}
y = y - 1;
e = e - 1;
while (e != -1) {
b = 1 / ((y) * (y));
sum2 = sum2 + b;
e--;
y--;
}
printf("sum from biggest to smallest is %.16f\n", sum);
printf("and its error %.16f\n", f - sum);
printf("sum from smallest to biggest is %.16f\n", sum2);
printf("and its error %.16f\n", f - sum2);
return 0;
}
您的代码在堆栈上创建了一个数组 double terms[n];
,这对程序崩溃前可以执行的迭代次数设置了硬性限制。
但是你甚至没有从这个数组中获取任何东西,所以根本没有理由把它放在那里。我更改了您的代码以摆脱 terms[]
:
#include <stdio.h>
int main() {
double pi2over6 = 1.644934066848226;
double sum = 0.0, sum2 = 0.0;
double y;
int i, n;
printf("Enter number of iterations:\n");
scanf("%d", &n);
y = 1.0;
for (i = 0; i < n; i++) {
sum += 1.0 / (y * y);
y += 1.0;
}
for (i = 0; i < n; i++) {
y -= 1.0;
sum2 += 1.0 / (y * y);
}
printf("sum from biggest to smallest is %.16f\n", sum);
printf("and its error %.16f\n", pi2over6 - sum);
printf("sum from smallest to biggest is %.16f\n", sum2);
printf("and its error %.16f\n", pi2over6 - sum2);
return 0;
}
当这是 运行 时,比如说,十亿次迭代,smallest-first 方法要准确得多:
Enter number of iterations:
1000000000
sum from biggest to smallest is 1.6449340578345750
and its error 0.0000000090136509
sum from smallest to biggest is 1.6449340658482263
and its error 0.0000000009999996
当您将两个数量级不同的 floating-point 数相加时,最小数的低位会丢失。
当您从小到大求和时,k
的部分和从 N
增长到 n
,例如 Σ1/k²
,即大约 1/n-1/N
(蓝色),与 1/n²
.
进行比较
当你从大到小求和时,k
的部分和从n
增长到N
,大约是Σ1/k²
,大约是π²/6-1/n
(绿色)与 1/n²
.
进行比较
很明显,第二种情况会导致更多的比特丢失。
我正在尝试对已排序的正递减浮点数数组求和。我已经看到,对它们求和的最好方法是开始将数字从最低到最高相加。我写这段代码是为了举个例子,但是,从最高数字开始的总和更精确。为什么? (当然,和1/k^2应该是f=1.644934066848226)。
#include <stdio.h>
#include <math.h>
int main() {
double sum = 0;
int n;
int e = 0;
double r = 0;
double f = 1.644934066848226;
double x, y, c, b;
double sum2 = 0;
printf("introduce n\n");
scanf("%d", &n);
double terms[n];
y = 1;
while (e < n) {
x = 1 / ((y) * (y));
terms[e] = x;
sum = sum + x;
y++;
e++;
}
y = y - 1;
e = e - 1;
while (e != -1) {
b = 1 / ((y) * (y));
sum2 = sum2 + b;
e--;
y--;
}
printf("sum from biggest to smallest is %.16f\n", sum);
printf("and its error %.16f\n", f - sum);
printf("sum from smallest to biggest is %.16f\n", sum2);
printf("and its error %.16f\n", f - sum2);
return 0;
}
您的代码在堆栈上创建了一个数组 double terms[n];
,这对程序崩溃前可以执行的迭代次数设置了硬性限制。
但是你甚至没有从这个数组中获取任何东西,所以根本没有理由把它放在那里。我更改了您的代码以摆脱 terms[]
:
#include <stdio.h>
int main() {
double pi2over6 = 1.644934066848226;
double sum = 0.0, sum2 = 0.0;
double y;
int i, n;
printf("Enter number of iterations:\n");
scanf("%d", &n);
y = 1.0;
for (i = 0; i < n; i++) {
sum += 1.0 / (y * y);
y += 1.0;
}
for (i = 0; i < n; i++) {
y -= 1.0;
sum2 += 1.0 / (y * y);
}
printf("sum from biggest to smallest is %.16f\n", sum);
printf("and its error %.16f\n", pi2over6 - sum);
printf("sum from smallest to biggest is %.16f\n", sum2);
printf("and its error %.16f\n", pi2over6 - sum2);
return 0;
}
当这是 运行 时,比如说,十亿次迭代,smallest-first 方法要准确得多:
Enter number of iterations:
1000000000
sum from biggest to smallest is 1.6449340578345750
and its error 0.0000000090136509
sum from smallest to biggest is 1.6449340658482263
and its error 0.0000000009999996
当您将两个数量级不同的 floating-point 数相加时,最小数的低位会丢失。
当您从小到大求和时,k
的部分和从 N
增长到 n
,例如 Σ1/k²
,即大约 1/n-1/N
(蓝色),与 1/n²
.
当你从大到小求和时,k
的部分和从n
增长到N
,大约是Σ1/k²
,大约是π²/6-1/n
(绿色)与 1/n²
.
很明显,第二种情况会导致更多的比特丢失。