正确使用 fftw fft

Question

我想在我的项目中使用 fftw 库中的 fft 函数，因此创建了以下函数：

void fft(const int size, DCOMPLEX *a, DCOMPLEX *b)
{
    fftw_plan p;
    p = fftw_plan_dft_1d(size, a, b, FFTW_FORWARD, FFTW_ESTIMATE);
    fftw_execute(p); /* repeat as needed */
    fftw_destroy_plan(p);
}

和

static fftw_complex *in;
static fftw_complex *out;
static fftw_plan p_fft, p_ifft;
void init_fft(const int N)
{
    in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*N);
    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*N);
    if(fftw_init_threads() != 0)
    {
        printf("Using %d threads!\n", omp_get_max_threads());
        fftw_plan_with_nthreads(omp_get_max_threads());
    };
    p_fft = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_EXHAUSTIVE);
    p_ifft = fftw_plan_dft_1d(N, in, out, FFTW_BACKWARD, FFTW_EXHAUSTIVE);
    memset(in, 0, N);
    memset(out, 0, N);

}
void fft_pre_plan(const int size, DCOMPLEX *a, DCOMPLEX *b)
{
    memcpy(in, a, size*sizeof(DCOMPLEX));
    fftw_execute(p_ifft); /* repeat as needed */
    memcpy(b, out, size*sizeof(DCOMPLEX));
}

我将结果与 b = numpy.fft.fft(a) 的结果进行比较，同时称它们为

fft(a.size(), a, b)

或

init_fft(a.size())//Called once
fftw_pre_plan(a.size(), a, b)//Called for each fft

和.size()数组的长度（伪代码）。虽然后一种方法与 numpy 代码相比加速了 2，但我从中获得的结果是不正确的（即错误的值）。为什么？
编辑：如果你建议我使用 python 包装器：它们不会给我提供与使用后一种解决方案时相同的加速（甚至比 numpy 慢）。
比较它们时，我得到以下值作为输出（对于 2**14 元素和相同输入的数组）：

numpy.fft.fft: [ 0.00095184 -1.54074866e-32j  0.00095267 +6.52776772e-18j
  0.00095349 +2.58018535e-18j ...,  0.00095432 +1.26416719e-16j
  0.00095349 +2.61724648e-16j  0.00095267 +1.21645986e-16j]

fft(): [ 0.00095184 -1.54074866e-32j  0.00095267 -5.98482633e-17j
  0.00095349 -1.27638146e-16j ...,  0.00095432 -9.88543196e-16j
  0.00095349 -9.30058376e-16j  0.00095267 -1.09881551e-15j]

fft_pre_plan(): [ 0.00095184 -1.54074866e-32j  0.00095267 -1.09881551e-15j
  0.00095349 -9.30058376e-16j ...,  0.00095432 -1.79405492e-16j
  0.00095349 -1.27638146e-16j  0.00095267 -5.98482633e-17j]

会不会是我有舍入误差的原因？

Answer 1

对于double精度运算，舍入误差大约为10^-16到10^-15来自许多 FFT 实现，如 FFTW accuracy benchmarks.

所示

请注意，10^-15 比峰值幅度 1.0 小 300dB。大多数实际信号的动态范围要小得多（例如 16-bit CD quality audio has an SNR of ~90dB). If your application does really need that much precision, you would likely need to use an arbitrary precision implementation of the FFT (that's what is used as a reference implementation in FFTW accuracy measurements）。

正确使用 fftw fft

Using fftw fft correctly

c

python

numpy

fftw