Fortran fft 精度损失
Loss of precision in fortran fft
我在用 Fortran 计算某些数据的 fft 时遇到问题。不知道是不是算法有问题,四舍五入,精度不够还是什么。
这是代码
module fft_mod
public :: fft1D
integer,parameter :: cp = selected_real_kind(14)
real(cp),parameter :: PI = real(3.14159265358979,cp)
contains
subroutine fft1D(x)
complex(cp), dimension(:), intent(inout) :: x
complex(cp), dimension(:), allocatable :: temp
complex(cp) :: S
integer :: N
complex(cp) :: j ! sqrt(-1)
integer :: i,k
N=size(x)
allocate(temp(N))
j = cmplx(0.0_cp,1.0_cp,cp)
S = cmplx(0.0_cp,0.0_cp,cp)
temp = cmplx(0.0_cp,0.0_cp,cp)
do i = 1,N
do k = 1,N
S = S + x(k)*exp(real(-2.0,cp)*PI*j*real(k-1,cp)*real(i-1,cp)/real(N,cp))
enddo
temp(i) = S
S = cmplx(0.0_cp,0.0_cp,cp)
enddo
x = temp
deallocate(temp)
end subroutine
end module
program test
use fft_mod
implicit none
complex(cp), dimension(10) :: data = (/1.0, 2.0, 3.0, 4.0, 5.0, 5.0, 4.0, 3.0, 2.0, 1.0/)
integer :: i
call fft1D(data)
do i=1,10
write(*,*) data(i)
end do
end program test
运行 这在 fortran 中给出
C:\Users\Charlie\Desktop\fft>gmake
gfortran -J".\mod" -fimplicit-none -Wuninitialized -g -Wall -Wextra -fbacktrace
-fcheck=all -O0 -fopenmp -D_QUAD_PRECISION_ -cpp -c -o .\obj\testFFT.o testFFT
.f90
gfortran -o .\test -J".\mod" -fimplicit-none -Wuninitialized -g -Wall -Wextra
-fbacktrace -fcheck=all -O0 -fopenmp -D_QUAD_PRECISION_ -cpp .\obj\testFFT.o
C:\Users\Charlie\Desktop\fft>test.exe
( 30.000000000000000 , 0.0000000000000000 )
( -9.4721355260035178 , -3.0776825738331275 )
( 1.2032715918097736E-007, 8.7422769579070803E-008)
(-0.52786408204828272 ,-0.72654221835813126 )
( 5.6810824045072650E-008, 1.7484555003832725E-007)
( 1.0325074129013956E-013, 2.6226834001115759E-007)
( -8.5216018574918451E-008, 2.6226836247200680E-007)
(-0.52786395855490920 , 0.72654325051559143 )
( -4.8130813040669906E-007, 3.4969128892559098E-007)
( -9.4721398159606647 , 3.0776922072585111 )
但是运行 matlab 中的相同数据集给出
format long ; g = [1:5 5:-1:1]; fft(g)'
ans =
30.000000000000000
-9.472135954999580 + 3.077683537175253i
0
-0.527864045000420 + 0.726542528005361i
0
0
0
-0.527864045000420 - 0.726542528005361i
0
-9.472135954999580 - 3.077683537175253i
我相信我使用 selected_real_kind(14) 使用双精度,但看起来结果充其量只是单精度。我确定一些散布的 real(,cp) 不是必需的,但我不知道与 matlab 相比结果在哪里、如何或为什么看起来是单精度的。
非常感谢任何帮助!
更新:
根据已接受答案的建议,此处唯一需要更改的是:
real(cp),parameter :: PI = real(3.14159265358979,cp)
到
real(cp),parameter :: PI = 3.14159265358979_cp
问题在于您如何定义实数,特别是 pi
。当你定义
real(cp),parameter :: PI = real(3.14159265358979,cp)
您正在将参数 3.14159265358979 传递给函数 real
。但是实数具有默认的单精度,因此您的实数在进入函数时会被转换为单精度。考虑以下示例:
program main
integer,parameter :: cp = selected_real_kind(14)
real(cp),parameter :: pi = real(3.14159265358979,cp)
real(cp),parameter :: pj = 3.14159265358979_cp
write(*,*) pi
write(*,*) pj
end program main
使用 pgfortran
编译且没有选项,这给了我:
3.141592741012573
3.141592653589790
定义任何实数时,您应该使用[]_cp
而不是real([],cp)
来分配种类。
编辑:此问题也会影响您定义 0.0
、1.0
和 2.0
的方式,但这些数字可能会被精确转换为二进制,并且不会出现相同的舍入错误.
如果您只需要“接近”双精度 (Real(8)) 的值,则上面接受的答案是一个合理的解决方案,因为您已将 Pi 明确定义为接近 Real(8) 的数字位数。 Pi 到 full Real(8) 的实际值为
3.1415926535897931
c.f。你的参数
3.14159265358979
如果您希望获得更通用的“与 _cp 一致的完全精度”结果,您可能希望使用类似
Real(cP), Parameter :: Pi = 4_cp*ATan(1_cp)
***编辑:感谢 francescalus 发现拼写错误,这应该是 ATan(1._cp),但实际上,如下所述,应该使用“参数化”数字并避免使用“_cp” " 都在 args 等
现在,如果 _cp 为 16,则您的 Pi 将自动为:
3.14159265358979323846264338327950280
现在,无论何时更改 _cp,您的代码都会自动具有“与 _cp 一致的完全精度”。
顺便说一句,您可能还希望“参数化”某些基本数字,例如:
Real(cp), Parameter :: Zero = 0_cp
Real(cp), Parameter :: One = 1_cp
:
...等等
***编辑:感谢 francescalus 发现拼写错误,但在这些特定的表达式中,虽然 0.0_cp 和 1.0_cp 可能更好,但这并不重要,因为它们是 Params声明负责>
然后,在你的代码中,你可以写,例如:
...
Real(cp), Parameter :: Pi = Four*ATan(One)
...
S = Cmplx(Zero, Zero)*Exp(-Two) ! etc etc
并且不必担心到处添加 _cp 等,read/debug 等
更容易
此策略还有某些其他优势,尤其是在遗留代码中,例如
If( a == 1.0 ) Then
对比
If( a == One ) Then
...但那是另一回事了。
顺带一提,在某种程度上是“风格”的问题,在Fortran中,算术默认为表达式中的“最大”precision/type,所以
S = S + x(k)*exp(real(-2.0,cp)*PI*j*real(k-1,cp)*real(i-1,cp)/real(N,cp))
应该等同于
S = S + x(k)*exp(-2*PI*j*(k-1)*(i-1)/N)
或者,甚至更好
S = S + x(k)*exp(-Two*PI*j*(k-1)*(i-1)/N)
就个人而言,我发现它越容易阅读,就越容易正确。
...尽管如上所述,如果您的算术仅包含整数,则可能需要考虑其他因素。但是,如果您仅对 Real 进行“参数化”,那么如果您使用“Two”,那么在真正需要 2.0 的情况下,您永远不会有使用“2”的风险(例如 div/0)。
最后,既然你有“参数化”Pi,为什么不也有 2*Pi,例如:
...
Real(cp), Parameter :: TwoPi = Two*Pi
现在,表达式
S = S + x(k)*exp(-Two*PI*j*(k-1)*(i-1)/N)
可以
S = S + x(k)*exp(-TwoPI*j*(k-1)*(i-1)/N)
节省了一点CPU。事实上,扩展此逻辑可以进一步提高代码的性能。
...在一个完全独立的点上,你知道你的 var Temp() 可以是“自动的”吗,比如说
complex(cp) :: temp(Size(x, Dim = 1))
那么就不需要Allocatable等了。尽管从我的角度来看,似乎可以完全不使用“S”和“Temp(:)”来编写您的代码,从而提供更简单和更快的代码。
***** 附录:根据对我的回答的一些评论,我认为这可能有助于显示对 OP 代码的可能更改以提高性能,并在某种程度上进一步提高精度。
然而,在此之前,OP 没有说明为什么需要特定的精度。这可能是许多 industrial/real 世界设置中的一个重要问题。例如,在某些情况下,速度比精度重要得多,只要精度足够好以提供特定于上下文的 reliability/decision 制作。例如,可以想象在某些情况下,计算应该全部在 Real(4) 中。这件事需要单独的、重要的讨论。
尽管如此,Ross 的回答纠正了“内存表示”方面的精度问题,而我的原始答案提高了“您实际上需要提供足够的 sig figs 开始”方面的精度(即使声明是正确的),减少 FLOPS 的数量不仅可以提高速度,还可以提高精度,因为每个 FLOP 都会引入 truncation/round 关闭的可能性,特别是对于单个表达式中的一长串 FLOP。
此外,这个 OP 问题可以用“身份”公式 (4*ATan(1)) 很好地解决。在许多情况下,这要么是不可能的,要么是不切实际的,然后需要一些不太优雅的方法,但我把这些留到以后再说。
以下只是 one/two 个可能的替代方案。
我没有测试过。
所以它可能需要调整;请随时提出改进建议。
如果运气好的话,它可能会减少大约 80% 左右的 FLOPs。
... 如果方便的话,我将感谢 OP 对不同的实现进行基准测试并报告结果。您可能还希望对 _cp = 4、8、16 进行基准测试,只是为了演示速度和精度之间的权衡。
此替代版本需要对调用进行明显更新 s/r。
module fft_mod
!
public :: fft1D
!
Integer,parameter :: cp = selected_real_kind(14) ! just out of curiosity, why "14", see my comments on "reality vs. precision"
!>>real(cp),parameter :: PI = real(3.14159265358979,cp)
!
Real(cp), Parameter :: Zero = 0_cp ! these types of declarations can be created in a separate Module,
! say "MyDeclarationModule", and then just access with "Use"
!
!XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
! NOTE: With respect to francescalus comments on issues with "1_cp" syntax, the usage here works fine as shown in the previous result,
! though francescalus' comments are fair in suggesting that other approaches make for better coding.
! However, and to be clear, I don't actually use "_xx" style myslef. I have used it here ONLY to be consistent with the
! the OP's and earlier answers. The usage of it here is NOT a recommendation. A discussion of the alternatives
! is much too long and strays from the immediate issues too far.
! ... Perhaps francescalus will take up the mantle and re-write the OP's code for alternatives
!XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
!
Real(cp), Parameter :: One = 1_cp
Real(cp), Parameter :: Two = 2_cp
Real(cp), Parameter :: Four = 4_cp
!
! ... after this point, there is no need to see "_cp" again, in this example
!
Real(cp), Parameter :: Pi = Four*ATan(One) ! this guarrantees maximal precision for Pi, up to "_cp"
!
Real(cp), Parameter :: TwoPi = Two*Pi ! Vladimir: now, and only now (that I've talem most everything out of the loop),
! this declaration has less value ... unlike previously, when it was
! still in the inner NxN, and when it saved approx 15 - 20% of FLOPs
! Crucially, if you do a lot of computational mathematics, TwoPi, PiBy2, Root2Pi, etc etc
! arise with considerable frequency, and creating these as Parameters makes for much
! improvement, and I would think it a practice to be encouraged.
!
Complex(cp), Parameter :: SqrtM1 = Cmplx(Zero, One) ! sqrt(-1), this is your "j",
! sorry but "j" sounds just too much like an "integer" to me
! (for us "old timers"), so I changed the name to something more meaningful to me
!
!
Contains
!
Pure Subroutine fft1D(x, n, Temp) ! OPTIONAL, return the results explicitly to save vector assignment, and preserve original data
! OPTIONAL, send the size of x() explicitly as n, much cleaner
! you could leave it as s/r fft1D(x), and most of the savings below would still apply
! ... also, my practice is to make everything Pure/Elemental where reasonable
!
Integer, Intent(In) :: n ! Optional, but cleaner
! complex(cp), Intent(InOut) :: x(n) ! consider as just Intent(In), and return Temp() as Intent(Out)
complex(cp), Intent(In) :: x(n) ! consider as just Intent(In), and return Temp() as Intent(Out)
!
! Locals
! ------
!
! Complex(cp), Allocatable :: temp(:) ! no need for Allocatable
! Complex(cp) :: temp(Size(x), Dim = 1) ! I prefer to pass n explicitly
Complex(cp), Intent(Out) :: temp(n) ! make Intent(Out) to preserve data and save a vector assignment
!
! complex(cp) :: S
! integer :: N
! complex(cp) :: j ! sqrt(-1)
!
Integer :: i, k
!
Complex(cp) :: TPSdivN ! new reduce 4 nxn mults to 1
!
Complex(cp) :: TPSdivNiM1 ! new, to reduce 5 nxn mults to 1 nx1
!
Real(cp) :: rKM1(n) ! new, to obviate nxn k-1's in inner loop
!
! N=size(x) ! my preference is to pass N explicitly,
! but can be done this way if prefered (need to make appropriate changes above)
!
! allocate(temp(N)) ! Temp() can be either automatic or an Arg, no need for Allocate
! j = cmplx(0.0_cp,1.0_cp,cp) ! make Parameter, and rename to SqrtM1 (personal pref)
! S = cmplx(0.0_cp,0.0_cp,cp) ! Not required with "improved" inner loop
!
!
temp(1:n) = Cmplx(Zero, Zero) ! S is not needed, just start with Temp() directly
!
TPSdivN = -TwoPi*SqrtM1/n ! new, move everything out all loops that can be
!
ForAll( k=1:n) rKM1(k) = k - 1 ! new, this allows the elimination of NxN "k-1's" in the inner
! my preference is for ForAll's, but you could use Do,
!
do i = 1,N
!
TPSdivNiM1 = TPSdivN*(i-1) ! new. everything out of inner loop that can be
!
! improved, inner do, but could use "no-Do" alternative as illustrated below
!
! do k = 1,N
!>> S = S + x(k)*exp(real(-2.0,cp)*PI*j*real(k-1,cp)*real(i-1,cp)/real(N,cp)) ! original, many unneccessary nxn FLOPs, type conversions, etc
! temp(i) = temp(i) + x(k)*Exp(TPSdivNiM1*rKM1(k))) ! use array of k-1's, then the inner k-loop is no longer required,
! and can use Product()/Sum() or Dot_Product() intrinsics, see example below
! End Do
!
! alternative "direct array" approach to inner loop, or "no-DO" version ... there are other possibilities.
!
Temp(i) = Dot_Product( x(1:n), Exp(TPSdivNiM1*rKM1(1:n)) ) ! this approach can have certain advantages depending on circumstances
!
! temp(i) = S ! not needed
! S = cmplx(0.0_cp,0.0_cp,cp) ! not needed
!
End Do
!
! x(1:n) = temp(1:n) ! not need if Temp() is Intent(Out) that save this vector assignment, and the original data
! deallocate(temp) ! not needed
!
End Subroutine fft1D
!
! end module
End Module fft_mod
我在用 Fortran 计算某些数据的 fft 时遇到问题。不知道是不是算法有问题,四舍五入,精度不够还是什么。
这是代码
module fft_mod
public :: fft1D
integer,parameter :: cp = selected_real_kind(14)
real(cp),parameter :: PI = real(3.14159265358979,cp)
contains
subroutine fft1D(x)
complex(cp), dimension(:), intent(inout) :: x
complex(cp), dimension(:), allocatable :: temp
complex(cp) :: S
integer :: N
complex(cp) :: j ! sqrt(-1)
integer :: i,k
N=size(x)
allocate(temp(N))
j = cmplx(0.0_cp,1.0_cp,cp)
S = cmplx(0.0_cp,0.0_cp,cp)
temp = cmplx(0.0_cp,0.0_cp,cp)
do i = 1,N
do k = 1,N
S = S + x(k)*exp(real(-2.0,cp)*PI*j*real(k-1,cp)*real(i-1,cp)/real(N,cp))
enddo
temp(i) = S
S = cmplx(0.0_cp,0.0_cp,cp)
enddo
x = temp
deallocate(temp)
end subroutine
end module
program test
use fft_mod
implicit none
complex(cp), dimension(10) :: data = (/1.0, 2.0, 3.0, 4.0, 5.0, 5.0, 4.0, 3.0, 2.0, 1.0/)
integer :: i
call fft1D(data)
do i=1,10
write(*,*) data(i)
end do
end program test
运行 这在 fortran 中给出
C:\Users\Charlie\Desktop\fft>gmake
gfortran -J".\mod" -fimplicit-none -Wuninitialized -g -Wall -Wextra -fbacktrace
-fcheck=all -O0 -fopenmp -D_QUAD_PRECISION_ -cpp -c -o .\obj\testFFT.o testFFT
.f90
gfortran -o .\test -J".\mod" -fimplicit-none -Wuninitialized -g -Wall -Wextra
-fbacktrace -fcheck=all -O0 -fopenmp -D_QUAD_PRECISION_ -cpp .\obj\testFFT.o
C:\Users\Charlie\Desktop\fft>test.exe
( 30.000000000000000 , 0.0000000000000000 )
( -9.4721355260035178 , -3.0776825738331275 )
( 1.2032715918097736E-007, 8.7422769579070803E-008)
(-0.52786408204828272 ,-0.72654221835813126 )
( 5.6810824045072650E-008, 1.7484555003832725E-007)
( 1.0325074129013956E-013, 2.6226834001115759E-007)
( -8.5216018574918451E-008, 2.6226836247200680E-007)
(-0.52786395855490920 , 0.72654325051559143 )
( -4.8130813040669906E-007, 3.4969128892559098E-007)
( -9.4721398159606647 , 3.0776922072585111 )
但是运行 matlab 中的相同数据集给出
format long ; g = [1:5 5:-1:1]; fft(g)'
ans =
30.000000000000000
-9.472135954999580 + 3.077683537175253i
0
-0.527864045000420 + 0.726542528005361i
0
0
0
-0.527864045000420 - 0.726542528005361i
0
-9.472135954999580 - 3.077683537175253i
我相信我使用 selected_real_kind(14) 使用双精度,但看起来结果充其量只是单精度。我确定一些散布的 real(,cp) 不是必需的,但我不知道与 matlab 相比结果在哪里、如何或为什么看起来是单精度的。
非常感谢任何帮助!
更新:
根据已接受答案的建议,此处唯一需要更改的是:
real(cp),parameter :: PI = real(3.14159265358979,cp)
到
real(cp),parameter :: PI = 3.14159265358979_cp
问题在于您如何定义实数,特别是 pi
。当你定义
real(cp),parameter :: PI = real(3.14159265358979,cp)
您正在将参数 3.14159265358979 传递给函数 real
。但是实数具有默认的单精度,因此您的实数在进入函数时会被转换为单精度。考虑以下示例:
program main
integer,parameter :: cp = selected_real_kind(14)
real(cp),parameter :: pi = real(3.14159265358979,cp)
real(cp),parameter :: pj = 3.14159265358979_cp
write(*,*) pi
write(*,*) pj
end program main
使用 pgfortran
编译且没有选项,这给了我:
3.141592741012573
3.141592653589790
定义任何实数时,您应该使用[]_cp
而不是real([],cp)
来分配种类。
编辑:此问题也会影响您定义 0.0
、1.0
和 2.0
的方式,但这些数字可能会被精确转换为二进制,并且不会出现相同的舍入错误.
如果您只需要“接近”双精度 (Real(8)) 的值,则上面接受的答案是一个合理的解决方案,因为您已将 Pi 明确定义为接近 Real(8) 的数字位数。 Pi 到 full Real(8) 的实际值为
3.1415926535897931
c.f。你的参数
3.14159265358979
如果您希望获得更通用的“与 _cp 一致的完全精度”结果,您可能希望使用类似
Real(cP), Parameter :: Pi = 4_cp*ATan(1_cp)
***编辑:感谢 francescalus 发现拼写错误,这应该是 ATan(1._cp),但实际上,如下所述,应该使用“参数化”数字并避免使用“_cp” " 都在 args 等
现在,如果 _cp 为 16,则您的 Pi 将自动为:
3.14159265358979323846264338327950280
现在,无论何时更改 _cp,您的代码都会自动具有“与 _cp 一致的完全精度”。
顺便说一句,您可能还希望“参数化”某些基本数字,例如:
Real(cp), Parameter :: Zero = 0_cp
Real(cp), Parameter :: One = 1_cp
:
...等等
***编辑:感谢 francescalus 发现拼写错误,但在这些特定的表达式中,虽然 0.0_cp 和 1.0_cp 可能更好,但这并不重要,因为它们是 Params声明负责>
然后,在你的代码中,你可以写,例如:
...
Real(cp), Parameter :: Pi = Four*ATan(One)
...
S = Cmplx(Zero, Zero)*Exp(-Two) ! etc etc
并且不必担心到处添加 _cp 等,read/debug 等
更容易此策略还有某些其他优势,尤其是在遗留代码中,例如
If( a == 1.0 ) Then
对比
If( a == One ) Then
...但那是另一回事了。
顺带一提,在某种程度上是“风格”的问题,在Fortran中,算术默认为表达式中的“最大”precision/type,所以
S = S + x(k)*exp(real(-2.0,cp)*PI*j*real(k-1,cp)*real(i-1,cp)/real(N,cp))
应该等同于
S = S + x(k)*exp(-2*PI*j*(k-1)*(i-1)/N)
或者,甚至更好
S = S + x(k)*exp(-Two*PI*j*(k-1)*(i-1)/N)
就个人而言,我发现它越容易阅读,就越容易正确。
...尽管如上所述,如果您的算术仅包含整数,则可能需要考虑其他因素。但是,如果您仅对 Real 进行“参数化”,那么如果您使用“Two”,那么在真正需要 2.0 的情况下,您永远不会有使用“2”的风险(例如 div/0)。
最后,既然你有“参数化”Pi,为什么不也有 2*Pi,例如:
...
Real(cp), Parameter :: TwoPi = Two*Pi
现在,表达式
S = S + x(k)*exp(-Two*PI*j*(k-1)*(i-1)/N)
可以
S = S + x(k)*exp(-TwoPI*j*(k-1)*(i-1)/N)
节省了一点CPU。事实上,扩展此逻辑可以进一步提高代码的性能。
...在一个完全独立的点上,你知道你的 var Temp() 可以是“自动的”吗,比如说
complex(cp) :: temp(Size(x, Dim = 1))
那么就不需要Allocatable等了。尽管从我的角度来看,似乎可以完全不使用“S”和“Temp(:)”来编写您的代码,从而提供更简单和更快的代码。
***** 附录:根据对我的回答的一些评论,我认为这可能有助于显示对 OP 代码的可能更改以提高性能,并在某种程度上进一步提高精度。
然而,在此之前,OP 没有说明为什么需要特定的精度。这可能是许多 industrial/real 世界设置中的一个重要问题。例如,在某些情况下,速度比精度重要得多,只要精度足够好以提供特定于上下文的 reliability/decision 制作。例如,可以想象在某些情况下,计算应该全部在 Real(4) 中。这件事需要单独的、重要的讨论。
尽管如此,Ross 的回答纠正了“内存表示”方面的精度问题,而我的原始答案提高了“您实际上需要提供足够的 sig figs 开始”方面的精度(即使声明是正确的),减少 FLOPS 的数量不仅可以提高速度,还可以提高精度,因为每个 FLOP 都会引入 truncation/round 关闭的可能性,特别是对于单个表达式中的一长串 FLOP。
此外,这个 OP 问题可以用“身份”公式 (4*ATan(1)) 很好地解决。在许多情况下,这要么是不可能的,要么是不切实际的,然后需要一些不太优雅的方法,但我把这些留到以后再说。
以下只是 one/two 个可能的替代方案。
我没有测试过。
所以它可能需要调整;请随时提出改进建议。
如果运气好的话,它可能会减少大约 80% 左右的 FLOPs。
... 如果方便的话,我将感谢 OP 对不同的实现进行基准测试并报告结果。您可能还希望对 _cp = 4、8、16 进行基准测试,只是为了演示速度和精度之间的权衡。
此替代版本需要对调用进行明显更新 s/r。
module fft_mod
!
public :: fft1D
!
Integer,parameter :: cp = selected_real_kind(14) ! just out of curiosity, why "14", see my comments on "reality vs. precision"
!>>real(cp),parameter :: PI = real(3.14159265358979,cp)
!
Real(cp), Parameter :: Zero = 0_cp ! these types of declarations can be created in a separate Module,
! say "MyDeclarationModule", and then just access with "Use"
!
!XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
! NOTE: With respect to francescalus comments on issues with "1_cp" syntax, the usage here works fine as shown in the previous result,
! though francescalus' comments are fair in suggesting that other approaches make for better coding.
! However, and to be clear, I don't actually use "_xx" style myslef. I have used it here ONLY to be consistent with the
! the OP's and earlier answers. The usage of it here is NOT a recommendation. A discussion of the alternatives
! is much too long and strays from the immediate issues too far.
! ... Perhaps francescalus will take up the mantle and re-write the OP's code for alternatives
!XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
!
Real(cp), Parameter :: One = 1_cp
Real(cp), Parameter :: Two = 2_cp
Real(cp), Parameter :: Four = 4_cp
!
! ... after this point, there is no need to see "_cp" again, in this example
!
Real(cp), Parameter :: Pi = Four*ATan(One) ! this guarrantees maximal precision for Pi, up to "_cp"
!
Real(cp), Parameter :: TwoPi = Two*Pi ! Vladimir: now, and only now (that I've talem most everything out of the loop),
! this declaration has less value ... unlike previously, when it was
! still in the inner NxN, and when it saved approx 15 - 20% of FLOPs
! Crucially, if you do a lot of computational mathematics, TwoPi, PiBy2, Root2Pi, etc etc
! arise with considerable frequency, and creating these as Parameters makes for much
! improvement, and I would think it a practice to be encouraged.
!
Complex(cp), Parameter :: SqrtM1 = Cmplx(Zero, One) ! sqrt(-1), this is your "j",
! sorry but "j" sounds just too much like an "integer" to me
! (for us "old timers"), so I changed the name to something more meaningful to me
!
!
Contains
!
Pure Subroutine fft1D(x, n, Temp) ! OPTIONAL, return the results explicitly to save vector assignment, and preserve original data
! OPTIONAL, send the size of x() explicitly as n, much cleaner
! you could leave it as s/r fft1D(x), and most of the savings below would still apply
! ... also, my practice is to make everything Pure/Elemental where reasonable
!
Integer, Intent(In) :: n ! Optional, but cleaner
! complex(cp), Intent(InOut) :: x(n) ! consider as just Intent(In), and return Temp() as Intent(Out)
complex(cp), Intent(In) :: x(n) ! consider as just Intent(In), and return Temp() as Intent(Out)
!
! Locals
! ------
!
! Complex(cp), Allocatable :: temp(:) ! no need for Allocatable
! Complex(cp) :: temp(Size(x), Dim = 1) ! I prefer to pass n explicitly
Complex(cp), Intent(Out) :: temp(n) ! make Intent(Out) to preserve data and save a vector assignment
!
! complex(cp) :: S
! integer :: N
! complex(cp) :: j ! sqrt(-1)
!
Integer :: i, k
!
Complex(cp) :: TPSdivN ! new reduce 4 nxn mults to 1
!
Complex(cp) :: TPSdivNiM1 ! new, to reduce 5 nxn mults to 1 nx1
!
Real(cp) :: rKM1(n) ! new, to obviate nxn k-1's in inner loop
!
! N=size(x) ! my preference is to pass N explicitly,
! but can be done this way if prefered (need to make appropriate changes above)
!
! allocate(temp(N)) ! Temp() can be either automatic or an Arg, no need for Allocate
! j = cmplx(0.0_cp,1.0_cp,cp) ! make Parameter, and rename to SqrtM1 (personal pref)
! S = cmplx(0.0_cp,0.0_cp,cp) ! Not required with "improved" inner loop
!
!
temp(1:n) = Cmplx(Zero, Zero) ! S is not needed, just start with Temp() directly
!
TPSdivN = -TwoPi*SqrtM1/n ! new, move everything out all loops that can be
!
ForAll( k=1:n) rKM1(k) = k - 1 ! new, this allows the elimination of NxN "k-1's" in the inner
! my preference is for ForAll's, but you could use Do,
!
do i = 1,N
!
TPSdivNiM1 = TPSdivN*(i-1) ! new. everything out of inner loop that can be
!
! improved, inner do, but could use "no-Do" alternative as illustrated below
!
! do k = 1,N
!>> S = S + x(k)*exp(real(-2.0,cp)*PI*j*real(k-1,cp)*real(i-1,cp)/real(N,cp)) ! original, many unneccessary nxn FLOPs, type conversions, etc
! temp(i) = temp(i) + x(k)*Exp(TPSdivNiM1*rKM1(k))) ! use array of k-1's, then the inner k-loop is no longer required,
! and can use Product()/Sum() or Dot_Product() intrinsics, see example below
! End Do
!
! alternative "direct array" approach to inner loop, or "no-DO" version ... there are other possibilities.
!
Temp(i) = Dot_Product( x(1:n), Exp(TPSdivNiM1*rKM1(1:n)) ) ! this approach can have certain advantages depending on circumstances
!
! temp(i) = S ! not needed
! S = cmplx(0.0_cp,0.0_cp,cp) ! not needed
!
End Do
!
! x(1:n) = temp(1:n) ! not need if Temp() is Intent(Out) that save this vector assignment, and the original data
! deallocate(temp) ! not needed
!
End Subroutine fft1D
!
! end module
End Module fft_mod