LAPACK 程序(Fortran 90)的数值精度不够?
Insufficient numerical precision of LAPACK procedures (Fortran 90)?
我写了一个简单的 Fortran 代码,它对 R^2
的 n+1
点进行多项式插值。它使用两个 LAPACK
程序求解线性方程组(我正在创建 Vandermonde 矩阵)(我的代码中的所有内容都是双精度的):
首先,它分解矩阵:http://sites.science.oregonstate.edu/~landaur/nacphy/lapack/routines/dgetrf.html
后来解决了系统:http://sites.science.oregonstate.edu/~landaur/nacphy/lapack/routines/dgetrs.html
该程序适用于从多项式生成的数据的几个测试案例:0,1,2,3,4。但是,当我从多项式 P(x) = x^10
中提供 11 个点时,它会推断出完全错误的系数。
输入(x..., y...
):
1.0
1.001
1.002
1.003
1.004
1.005
1.006
1.007
1.008
1.009
1.01
1.0
1.01004512021
1.02018096336
1.03040825707
1.04072773401
1.05114013204
1.06164619412
1.07224666847
1.08294230847
1.0937338728
1.10462212541
输出(a_n,...,a_0
):
-4.6992230177E+004
2.2042918738E+005
-3.2949938635E+005
5.0740528522E+004
2.4764893257E+005
-3.1846974845E+004
-1.7195378863E+005
-1.4751075818E+005
4.1766709741E+005
-2.6476448046E+005
5.6082872757E+004
我遇到了数值稳定性问题吗?还是我做错了什么?
编辑:我附上插值程序的代码(注意,我们实际上有 n
个点而不是 n+1
)。
module InterpolatorModule
contains
subroutine interpolate(n, x, y, a)
implicit none
integer :: n, i, j, info
integer, dimension (:), allocatable :: ipiv
real(8), dimension (:), pointer :: x, y, a
real(8), dimension(:,:), allocatable :: Mat_X, Mat_B
! create the Vandermonde matrix:
allocate ( Mat_X(n,n) )
do i = 1,n
do j = 1,n
Mat_X(i,j) = x(i) ** ( n - j )
end do
end do
! reshape ipnut data into a matrix form:
allocate ( Mat_B(n,1) )
do i = 1,n
Mat_B(i,1) = y(i)
end do
! prepare an array for the pivot indices from DGETRF:
allocate(ipiv(n))
call DGETRF(n, n, Mat_X, n, ipiv, info)
call DGETRS("N", n, 1, Mat_X, n, ipiv, Mat_B, n, info)
! save the results into the argument array
do i = 1,n
a(i) = Mat_B(i,1)
end do
deallocate(ipiv)
deallocate ( Mat_X )
deallocate ( Mat_B )
end subroutine interpolate
end module InterpolatorModule
编辑:main
程序:
program MyInterpolation
use InterpolatorModule
implicit none
integer :: line, n
real(8), dimension (:), pointer :: p_x, p_y, p_a
real(8), dimension(:), target, allocatable :: in1_x, in1_y, in1_a
character(len=23) :: str
open (1, file="test/in.txt", status="old", action="read", form="formatted")
n = 11 ! this value is not known at compulation time, simplification for SO
! allocate memory for the input data
allocate ( in1_x(n) )
allocate ( in1_y(n) )
allocate ( in1_a(n) )
! read the x_i coordinates:
do line = 1,n
read(1,*) in1_x(line)
end do
! read the y_i coordinates:
do line = 1,n
read(1,*) in1_y(line)
end do
close(1)
! assign pointers to the arrays:
p_x => in1_x
p_y => in1_y
p_a => in1_a
! call the interpolating procedure:
call interpolate(n, p_x, p_y, p_a)
! print out calculated coefficients:
do line = 1,n
write(str,'(ES23.10 E3)') in1_a(line)
write(*,'(a)') adjustl(trim(str))
end do
! free the allocated memory
deallocate (in1_x)
deallocate (in1_y)
deallocate (in1_a)
! ===========================================================================
end program MyInterpolation
Vandermonde 矩阵经常是病态的,这一点已得到广泛认可(并已在评论中指出)。例如,关于数值分析的标准教科书,D. Kincaid 和 W. Cheney,“数值分析,第二版”,Brooks/Cole 1996 年出版,第 336 页指出:
The coefficient matrix in Equation (12) is called the Vandermonde matrix. It is non-singular because the system has a unique solution for any choice of y0, y1, ..., yn [...]. The determinant of the Vandermonde matrix is therefore nonzero for distinct nodes x0, x1, ..., xn. [...] However, the Vandermonde matrix is often ill conditioned, and the coefficients ai may therefore be inaccurately determined by solving System (12). (See Gautschi [1984]). [...] Therefore, this approach is not recommended.
检查 GETRF
返回的矩阵已经告诉我们,我们遇到了相当低次数的多项式的麻烦。在 4 级时,LU 分解后最小相关元素的大小约为 10-12,而在 5 级时,最小的此类元素为 10-14。基于原始矩阵的所有元素都接近于一的事实,以及了解 LU 分解过程的基本步骤,很明显 GETRF
结果中的微小元素是由减法抵消产生的。
接近双精度 epsilon (≈ 10-16) 的元素的大小告诉我们大部分精度已经丢失。对于 6 次及更高次数的多项式,代码基本上是在纯噪声上运行。
我们可以通过将插值多项式的计算系数与更稳健的参考进行比较来进一步证实这一点。为简单起见,我使用 Wolfram Alpha 进行比较。对于 4 次多项式,Fortran 代码计算的系数精确到大约三位小数,对于 5 次多项式,这将缩小到一个正确的小数位。
就生成插值多项式系数的简单和更稳健的算法而言,我发现如下:
J。 N. Lyness 和 C.B。 Moler,“Van Der Monde 系统和数值微分”。 数值数学 8, 458-464 (1966)
我将论文中的 Algol 代码翻译成 ISO-C99,与 Wolfram Alpha 相比,它似乎提供了高达 8 级的合理结果。 Wolfram Alpha 放弃大于 8 的度数,我手头没有其他参考资料。即使使用这种更强大的算法,对于更高的度数,精度似乎也有明显的损失,Wolfram Alpha 和 Luness/Moler 算法之间只有 6 个小数位匹配。
#include <stdio.h>
#include <stdlib.h>
/* J. N. Lyness and C.B. Moler, "Van Der Monde Systems and Numerical
Differentiation." Numerische Mathematik 8, 458-464 (1966)
*/
void update (int k, int p, double *x, double fxk, double *C)
{
int d, s, m, n;
double xk, xkd;
xk = x[k];
m = k*(k+3)/2;
C[m] = fxk;
for (d = 1; d <= k; d++) {
xkd = xk - x[k-d];
for (s = 0; s <= ((d > p) ? p : d); s++) {
m = m - 1;
n = m + d;
if (s == 0) {
C[m] = C[n] + xk * (C[n-k-1] - C[n]) / xkd;
} else if (s == d) {
C[m] = (C[n+1] - C[n-k]) / xkd;
} else {
C[m] = C[n] + (xk * (C[n-k-1] - C[n]) + (C[n+1] - C[n-k]))/ xkd;
}
if (d > p) {
m = m - d + p;
}
}
}
}
/* Solve (k+1) x (k+1) system Vc = f, where V[i,j] = x[i]**j */
void vandal (int k, double *x, double *f, double *c)
{
double C [k*(k+3)/2+1];
for (int i = 0; i <= k; i++) {
update (i, k, x, f[i], C);
}
for (int i = 0; i <= k; i++) {
c[i] = C[k-i];
}
}
#define k 10
int main (void)
{
double x[k+1] = {1.000, 1.001, 1.002, 1.003, 1.004, 1.005,
1.006, 1.007, 1.008, 1.009, 1.010};
double f[k+1] = {1.00000000000, 1.01004512021, 1.02018096336,
1.03040825707, 1.04072773401, 1.05114013204,
1.06164619412, 1.07224666847, 1.08294230847,
1.09373387280, 1.10462212541};
double c[k+1] = {0};
vandal (k, x, f, c);
for (int i = 0; i <= k; i++) {
printf ("c[%2d] = % 23.16e\n", i, c[i]);
}
return EXIT_SUCCESS;
}
我写了一个简单的 Fortran 代码,它对 R^2
的 n+1
点进行多项式插值。它使用两个 LAPACK
程序求解线性方程组(我正在创建 Vandermonde 矩阵)(我的代码中的所有内容都是双精度的):
首先,它分解矩阵:http://sites.science.oregonstate.edu/~landaur/nacphy/lapack/routines/dgetrf.html
后来解决了系统:http://sites.science.oregonstate.edu/~landaur/nacphy/lapack/routines/dgetrs.html
该程序适用于从多项式生成的数据的几个测试案例:0,1,2,3,4。但是,当我从多项式 P(x) = x^10
中提供 11 个点时,它会推断出完全错误的系数。
输入(x..., y...
):
1.0
1.001
1.002
1.003
1.004
1.005
1.006
1.007
1.008
1.009
1.01
1.0
1.01004512021
1.02018096336
1.03040825707
1.04072773401
1.05114013204
1.06164619412
1.07224666847
1.08294230847
1.0937338728
1.10462212541
输出(a_n,...,a_0
):
-4.6992230177E+004
2.2042918738E+005
-3.2949938635E+005
5.0740528522E+004
2.4764893257E+005
-3.1846974845E+004
-1.7195378863E+005
-1.4751075818E+005
4.1766709741E+005
-2.6476448046E+005
5.6082872757E+004
我遇到了数值稳定性问题吗?还是我做错了什么?
编辑:我附上插值程序的代码(注意,我们实际上有 n
个点而不是 n+1
)。
module InterpolatorModule
contains
subroutine interpolate(n, x, y, a)
implicit none
integer :: n, i, j, info
integer, dimension (:), allocatable :: ipiv
real(8), dimension (:), pointer :: x, y, a
real(8), dimension(:,:), allocatable :: Mat_X, Mat_B
! create the Vandermonde matrix:
allocate ( Mat_X(n,n) )
do i = 1,n
do j = 1,n
Mat_X(i,j) = x(i) ** ( n - j )
end do
end do
! reshape ipnut data into a matrix form:
allocate ( Mat_B(n,1) )
do i = 1,n
Mat_B(i,1) = y(i)
end do
! prepare an array for the pivot indices from DGETRF:
allocate(ipiv(n))
call DGETRF(n, n, Mat_X, n, ipiv, info)
call DGETRS("N", n, 1, Mat_X, n, ipiv, Mat_B, n, info)
! save the results into the argument array
do i = 1,n
a(i) = Mat_B(i,1)
end do
deallocate(ipiv)
deallocate ( Mat_X )
deallocate ( Mat_B )
end subroutine interpolate
end module InterpolatorModule
编辑:main
程序:
program MyInterpolation
use InterpolatorModule
implicit none
integer :: line, n
real(8), dimension (:), pointer :: p_x, p_y, p_a
real(8), dimension(:), target, allocatable :: in1_x, in1_y, in1_a
character(len=23) :: str
open (1, file="test/in.txt", status="old", action="read", form="formatted")
n = 11 ! this value is not known at compulation time, simplification for SO
! allocate memory for the input data
allocate ( in1_x(n) )
allocate ( in1_y(n) )
allocate ( in1_a(n) )
! read the x_i coordinates:
do line = 1,n
read(1,*) in1_x(line)
end do
! read the y_i coordinates:
do line = 1,n
read(1,*) in1_y(line)
end do
close(1)
! assign pointers to the arrays:
p_x => in1_x
p_y => in1_y
p_a => in1_a
! call the interpolating procedure:
call interpolate(n, p_x, p_y, p_a)
! print out calculated coefficients:
do line = 1,n
write(str,'(ES23.10 E3)') in1_a(line)
write(*,'(a)') adjustl(trim(str))
end do
! free the allocated memory
deallocate (in1_x)
deallocate (in1_y)
deallocate (in1_a)
! ===========================================================================
end program MyInterpolation
Vandermonde 矩阵经常是病态的,这一点已得到广泛认可(并已在评论中指出)。例如,关于数值分析的标准教科书,D. Kincaid 和 W. Cheney,“数值分析,第二版”,Brooks/Cole 1996 年出版,第 336 页指出:
The coefficient matrix in Equation (12) is called the Vandermonde matrix. It is non-singular because the system has a unique solution for any choice of y0, y1, ..., yn [...]. The determinant of the Vandermonde matrix is therefore nonzero for distinct nodes x0, x1, ..., xn. [...] However, the Vandermonde matrix is often ill conditioned, and the coefficients ai may therefore be inaccurately determined by solving System (12). (See Gautschi [1984]). [...] Therefore, this approach is not recommended.
检查 GETRF
返回的矩阵已经告诉我们,我们遇到了相当低次数的多项式的麻烦。在 4 级时,LU 分解后最小相关元素的大小约为 10-12,而在 5 级时,最小的此类元素为 10-14。基于原始矩阵的所有元素都接近于一的事实,以及了解 LU 分解过程的基本步骤,很明显 GETRF
结果中的微小元素是由减法抵消产生的。
接近双精度 epsilon (≈ 10-16) 的元素的大小告诉我们大部分精度已经丢失。对于 6 次及更高次数的多项式,代码基本上是在纯噪声上运行。
我们可以通过将插值多项式的计算系数与更稳健的参考进行比较来进一步证实这一点。为简单起见,我使用 Wolfram Alpha 进行比较。对于 4 次多项式,Fortran 代码计算的系数精确到大约三位小数,对于 5 次多项式,这将缩小到一个正确的小数位。
就生成插值多项式系数的简单和更稳健的算法而言,我发现如下:
J。 N. Lyness 和 C.B。 Moler,“Van Der Monde 系统和数值微分”。 数值数学 8, 458-464 (1966)
我将论文中的 Algol 代码翻译成 ISO-C99,与 Wolfram Alpha 相比,它似乎提供了高达 8 级的合理结果。 Wolfram Alpha 放弃大于 8 的度数,我手头没有其他参考资料。即使使用这种更强大的算法,对于更高的度数,精度似乎也有明显的损失,Wolfram Alpha 和 Luness/Moler 算法之间只有 6 个小数位匹配。
#include <stdio.h>
#include <stdlib.h>
/* J. N. Lyness and C.B. Moler, "Van Der Monde Systems and Numerical
Differentiation." Numerische Mathematik 8, 458-464 (1966)
*/
void update (int k, int p, double *x, double fxk, double *C)
{
int d, s, m, n;
double xk, xkd;
xk = x[k];
m = k*(k+3)/2;
C[m] = fxk;
for (d = 1; d <= k; d++) {
xkd = xk - x[k-d];
for (s = 0; s <= ((d > p) ? p : d); s++) {
m = m - 1;
n = m + d;
if (s == 0) {
C[m] = C[n] + xk * (C[n-k-1] - C[n]) / xkd;
} else if (s == d) {
C[m] = (C[n+1] - C[n-k]) / xkd;
} else {
C[m] = C[n] + (xk * (C[n-k-1] - C[n]) + (C[n+1] - C[n-k]))/ xkd;
}
if (d > p) {
m = m - d + p;
}
}
}
}
/* Solve (k+1) x (k+1) system Vc = f, where V[i,j] = x[i]**j */
void vandal (int k, double *x, double *f, double *c)
{
double C [k*(k+3)/2+1];
for (int i = 0; i <= k; i++) {
update (i, k, x, f[i], C);
}
for (int i = 0; i <= k; i++) {
c[i] = C[k-i];
}
}
#define k 10
int main (void)
{
double x[k+1] = {1.000, 1.001, 1.002, 1.003, 1.004, 1.005,
1.006, 1.007, 1.008, 1.009, 1.010};
double f[k+1] = {1.00000000000, 1.01004512021, 1.02018096336,
1.03040825707, 1.04072773401, 1.05114013204,
1.06164619412, 1.07224666847, 1.08294230847,
1.09373387280, 1.10462212541};
double c[k+1] = {0};
vandal (k, x, f, c);
for (int i = 0; i <= k; i++) {
printf ("c[%2d] = % 23.16e\n", i, c[i]);
}
return EXIT_SUCCESS;
}