有什么方法可以在不使用 intel mkl 进行转置的情况下计算另一个维度的 2D FFT 的 1D FFT?

Is there any way to compute 1D FFT of 2D FFT in another dimension without transposition using intel mkl?

我想使用 mkl 计算存储为一维数组的二维数组的一维 FFT。 例如,

for (int j=0; j<NJ; j++) //rows
{
  for (int i=0; i<NI; i++) //columns
   {
     Pre_2D_array[i+j*NI].x=1.0;
     Pre_2D_array[i+j*NI].y=2.0;
   }
}

我想计算行维度中 Pre_2D_array 的一维 FFT。我能想到的唯一方法是重塑数组并像这样进行 FFT,

   for (int i=0; i<NI; i++) //columns
    {
      for (int j=0; j<NJ; j++) //rows
       {
         2D_array[j+i*NJ]=Pre_2D_array[i+j*NI];
       }
    }

DFTI_DESCRIPTOR_HANDLE desc_x = 0;
DftiCreateDescriptor(&desc_x, DFTI_PREC, DFTI_COMPLEX, 1, NJ);
DftiSetValue(desc_x, DFTI_NUMBER_OF_TRANSFORMS, NI);
DftiSetValue(desc_x, DFTI_INPUT_DISTANCE,  NJ);
DftiCommitDescriptor(desc_x);

DftiComputeForward(desc_x, 2D_array);

虽然这样可以得到正确答案。但是当数组很大时,对原始数组进行转置(整形)会浪费太多时间。 有没有办法在不重塑阵列的情况下进行 FFT?或者有什么方法可以尽快重塑阵列?

cpuinfo 是:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 79
model name  : Intel(R) Xeon(R) CPU E5-2648L v4 @ 1.80GHz
stepping    : 1
microcode   : 0xb000022
cpu MHz     : 1795.882
cache size  : 35840 KB
physical id : 0
siblings    : 14
core id     : 0
cpu cores   : 14
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 20
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap
bogomips    : 3591.76
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

您不能对数据的更高维度进行 1D FFT。您需要先进行转置,以便 FT 维度是数据在 RAM 中连续的维度。

然而,它并没有你想象的那么糟糕。在多核机器上,你可以很容易地设置一些线程,它们唯一的工作就是预先/post 安排 FT 数据。

据我所知,英特尔 MKL 不提供在数据元素之间具有 跨度 的数据中执行 FFT 的功能。

然而,FFTW 确实如此。每 4.4.1 Advanced Complex DFTs of the FFTW documentation:

fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
                             fftw_complex *in, const int *inembed,
                             int istride, int idist,
                             fftw_complex *out, const int *onembed,
                             int ostride, int odist,
                             int sign, unsigned flags);

This routine plans multiple multidimensional complex DFTs, and it extends the fftw_plan_dft routine (see Complex DFTs) to compute howmany transforms, each having rank rank and size n. In addition, the transform data need not be contiguous, but it may be laid out in memory with an arbitrary stride. To account for these possibilities, fftw_plan_many_dft adds the new parameters howmany, {i,o}nembed, {i,o}stride, and {i,o}dist. The FFTW basic interface (see Complex DFTs) provides routines specialized for ranks 1, 2, and 3, but the advanced interface handles only the general-rank case.

howmany is the (nonnegative) number of transforms to compute. The resulting plan computes howmany transforms, where the input of the k-th transform is at location in+k*idist (in C pointer arithmetic), and its output is at location out+k*odist. Plans obtained in this way can often be faster than calling FFTW multiple times for the individual transforms. The basic fftw_plan_dft interface corresponds to howmany=1 (in which case the dist parameters are ignored).

Each of the howmany transforms has rank rank and size n, as in the basic interface. In addition, the advanced interface allows the input and output arrays of each transform to be row-major subarrays of larger rank-rank arrays, described by inembed and onembed parameters, respectively. {i,o}nembed must be arrays of length rank, and n should be elementwise less than or equal to {i,o}nembed. Passing NULL for an nembed parameter is equivalent to passing n (i.e. same physical and logical dimensions, as in the basic interface.)

The stride parameters indicate that the j-th element of the input or output arrays is located at j*istride or j*ostride, respectively. (For a multi-dimensional array, j is the ordinary row-major index.) When combined with the k-th transform in a howmany loop, from above, this means that the (j,k)-th element is at j*stride+k*dist. (The basic fftw_plan_dft interface corresponds to a stride of 1.)

For in-place transforms, the input and output stride and dist parameters should be the same; otherwise, the planner may return NULL.

该页面方便地提供了一个对二维数组的列执行一维 FFT 的示例(有点令人困惑):

Transform each column of a 2d array with 10 rows and 3 columns:

   int rank = 1; /* not 2: we are computing 1d transforms */
   int n[] = {10}; /* 1d transforms of length 10 */
   int howmany = 3;
   int idist = odist = 1;
   int istride = ostride = 3; /* distance between two elements in 
                                 the same column */
   int *inembed = n, *onembed = n;

有关更多示例,请参阅 How do I use fftw_plan_many_dft on a transposed array of data?

FFTW 库在 fftw_plan_many_dft() 等函数中引入了参数 istrideostride 以避免转置数组。该页面上的最后一个示例是第二个维度上的 DFT。

同样,英特尔数学内核库引入了data layout configuration parameters such as DFTI_INPUT_STRIDES and DFTI_OUTPUT_STRIDESDFTI_NUMBER_OF_TRANSFORMS

第二个维度上的 DFT 可能看起来像(我没有测试过):

DftiCreateDescriptor(&desc_x, DFTI_PREC, DFTI_COMPLEX, 1, NJ);
DftiSetValue(desc_x, DFTI_NUMBER_OF_TRANSFORMS, NI);
DftiSetValue(desc_x, DFTI_INPUT_STRIDES, &NI);
DftiSetValue(desc_x, DFTI_OUTPUT_STRIDES, &NI);
DftiSetValue(desc_x, DFTI_INPUT_DISTANCE,  1);
DftiSetValue(desc_x, DFTI_OUTPUT_DISTANCE,  1);
DftiCommitDescriptor(desc_x);

DFTI_OUTPUT_STRIDES 被就地转换 (DFTI_PLACEMENT=DFTI_INPLACE) 忽略。