有什么方法可以在不使用 intel mkl 进行转置的情况下计算另一个维度的 2D FFT 的 1D FFT?
Is there any way to compute 1D FFT of 2D FFT in another dimension without transposition using intel mkl?
我想使用 mkl 计算存储为一维数组的二维数组的一维 FFT。
例如,
for (int j=0; j<NJ; j++) //rows
{
for (int i=0; i<NI; i++) //columns
{
Pre_2D_array[i+j*NI].x=1.0;
Pre_2D_array[i+j*NI].y=2.0;
}
}
我想计算行维度中 Pre_2D_array 的一维 FFT。我能想到的唯一方法是重塑数组并像这样进行 FFT,
for (int i=0; i<NI; i++) //columns
{
for (int j=0; j<NJ; j++) //rows
{
2D_array[j+i*NJ]=Pre_2D_array[i+j*NI];
}
}
DFTI_DESCRIPTOR_HANDLE desc_x = 0;
DftiCreateDescriptor(&desc_x, DFTI_PREC, DFTI_COMPLEX, 1, NJ);
DftiSetValue(desc_x, DFTI_NUMBER_OF_TRANSFORMS, NI);
DftiSetValue(desc_x, DFTI_INPUT_DISTANCE, NJ);
DftiCommitDescriptor(desc_x);
DftiComputeForward(desc_x, 2D_array);
虽然这样可以得到正确答案。但是当数组很大时,对原始数组进行转置(整形)会浪费太多时间。
有没有办法在不重塑阵列的情况下进行 FFT?或者有什么方法可以尽快重塑阵列?
cpuinfo 是:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2648L v4 @ 1.80GHz
stepping : 1
microcode : 0xb000022
cpu MHz : 1795.882
cache size : 35840 KB
physical id : 0
siblings : 14
core id : 0
cpu cores : 14
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap
bogomips : 3591.76
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
您不能对数据的更高维度进行 1D FFT。您需要先进行转置,以便 FT 维度是数据在 RAM 中连续的维度。
然而,它并没有你想象的那么糟糕。在多核机器上,你可以很容易地设置一些线程,它们唯一的工作就是预先/post 安排 FT 数据。
据我所知,英特尔 MKL 不提供在数据元素之间具有 跨度 的数据中执行 FFT 的功能。
然而,FFTW 确实如此。每 4.4.1 Advanced Complex DFTs of the FFTW documentation:
fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
fftw_complex *in, const int *inembed,
int istride, int idist,
fftw_complex *out, const int *onembed,
int ostride, int odist,
int sign, unsigned flags);
This routine plans multiple multidimensional complex DFTs, and it
extends the fftw_plan_dft
routine (see Complex DFTs) to compute
howmany transforms, each having rank rank and size n
. In addition,
the transform data need not be contiguous, but it may be laid out in
memory with an arbitrary stride. To account for these possibilities,
fftw_plan_many_dft
adds the new parameters howmany
, {i,o}nembed
,
{i,o}stride
, and {i,o}dist
. The FFTW basic interface (see Complex
DFTs) provides routines specialized for ranks 1, 2, and 3, but the
advanced interface handles only the general-rank case.
howmany
is the (nonnegative) number of transforms to compute. The
resulting plan computes howmany
transforms, where the input of the
k-th transform is at location in+k*idist
(in C pointer arithmetic),
and its output is at location out+k*odist
. Plans obtained in this
way can often be faster than calling FFTW multiple times for the
individual transforms. The basic fftw_plan_dft
interface corresponds
to howmany=1
(in which case the dist parameters are ignored).
Each of the howmany
transforms has rank rank and size n
, as in the
basic interface. In addition, the advanced interface allows the input
and output arrays of each transform to be row-major subarrays of
larger rank-rank arrays, described by inembed
and onembed
parameters, respectively. {i,o}nembed
must be arrays of length
rank
, and n
should be elementwise less than or equal to
{i,o}nembed
. Passing NULL
for an nembed
parameter is equivalent
to passing n
(i.e. same physical and logical dimensions, as in the
basic interface.)
The stride parameters indicate that the j-th
element of the input or
output arrays is located at j*istride
or j*ostride
, respectively.
(For a multi-dimensional array, j
is the ordinary row-major index.)
When combined with the k-th
transform in a howmany
loop, from
above, this means that the (j,k)-th element is at j*stride+k*dist
.
(The basic fftw_plan_dft
interface corresponds to a stride of 1.)
For in-place transforms, the input and output stride and dist
parameters should be the same; otherwise, the planner may return
NULL
.
该页面方便地提供了一个对二维数组的列执行一维 FFT 的示例(有点令人困惑):
Transform each column of a 2d array with 10 rows and 3 columns:
int rank = 1; /* not 2: we are computing 1d transforms */
int n[] = {10}; /* 1d transforms of length 10 */
int howmany = 3;
int idist = odist = 1;
int istride = ostride = 3; /* distance between two elements in
the same column */
int *inembed = n, *onembed = n;
有关更多示例,请参阅 How do I use fftw_plan_many_dft on a transposed array of data?。
FFTW 库在 fftw_plan_many_dft() 等函数中引入了参数 istride
或 ostride
以避免转置数组。该页面上的最后一个示例是第二个维度上的 DFT。
同样,英特尔数学内核库引入了data layout configuration parameters such as DFTI_INPUT_STRIDES
and DFTI_OUTPUT_STRIDES
或DFTI_NUMBER_OF_TRANSFORMS
。
第二个维度上的 DFT 可能看起来像(我没有测试过):
DftiCreateDescriptor(&desc_x, DFTI_PREC, DFTI_COMPLEX, 1, NJ);
DftiSetValue(desc_x, DFTI_NUMBER_OF_TRANSFORMS, NI);
DftiSetValue(desc_x, DFTI_INPUT_STRIDES, &NI);
DftiSetValue(desc_x, DFTI_OUTPUT_STRIDES, &NI);
DftiSetValue(desc_x, DFTI_INPUT_DISTANCE, 1);
DftiSetValue(desc_x, DFTI_OUTPUT_DISTANCE, 1);
DftiCommitDescriptor(desc_x);
DFTI_OUTPUT_STRIDES
被就地转换 (DFTI_PLACEMENT=DFTI_INPLACE
) 忽略。
我想使用 mkl 计算存储为一维数组的二维数组的一维 FFT。 例如,
for (int j=0; j<NJ; j++) //rows
{
for (int i=0; i<NI; i++) //columns
{
Pre_2D_array[i+j*NI].x=1.0;
Pre_2D_array[i+j*NI].y=2.0;
}
}
我想计算行维度中 Pre_2D_array 的一维 FFT。我能想到的唯一方法是重塑数组并像这样进行 FFT,
for (int i=0; i<NI; i++) //columns
{
for (int j=0; j<NJ; j++) //rows
{
2D_array[j+i*NJ]=Pre_2D_array[i+j*NI];
}
}
DFTI_DESCRIPTOR_HANDLE desc_x = 0;
DftiCreateDescriptor(&desc_x, DFTI_PREC, DFTI_COMPLEX, 1, NJ);
DftiSetValue(desc_x, DFTI_NUMBER_OF_TRANSFORMS, NI);
DftiSetValue(desc_x, DFTI_INPUT_DISTANCE, NJ);
DftiCommitDescriptor(desc_x);
DftiComputeForward(desc_x, 2D_array);
虽然这样可以得到正确答案。但是当数组很大时,对原始数组进行转置(整形)会浪费太多时间。 有没有办法在不重塑阵列的情况下进行 FFT?或者有什么方法可以尽快重塑阵列?
cpuinfo 是:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2648L v4 @ 1.80GHz
stepping : 1
microcode : 0xb000022
cpu MHz : 1795.882
cache size : 35840 KB
physical id : 0
siblings : 14
core id : 0
cpu cores : 14
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap
bogomips : 3591.76
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
您不能对数据的更高维度进行 1D FFT。您需要先进行转置,以便 FT 维度是数据在 RAM 中连续的维度。
然而,它并没有你想象的那么糟糕。在多核机器上,你可以很容易地设置一些线程,它们唯一的工作就是预先/post 安排 FT 数据。
据我所知,英特尔 MKL 不提供在数据元素之间具有 跨度 的数据中执行 FFT 的功能。
然而,FFTW 确实如此。每 4.4.1 Advanced Complex DFTs of the FFTW documentation:
fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany, fftw_complex *in, const int *inembed, int istride, int idist, fftw_complex *out, const int *onembed, int ostride, int odist, int sign, unsigned flags);
This routine plans multiple multidimensional complex DFTs, and it extends the
fftw_plan_dft
routine (see Complex DFTs) to compute howmany transforms, each having rank rank and sizen
. In addition, the transform data need not be contiguous, but it may be laid out in memory with an arbitrary stride. To account for these possibilities,fftw_plan_many_dft
adds the new parametershowmany
,{i,o}nembed
,{i,o}stride
, and{i,o}dist
. The FFTW basic interface (see Complex DFTs) provides routines specialized for ranks 1, 2, and 3, but the advanced interface handles only the general-rank case.
howmany
is the (nonnegative) number of transforms to compute. The resulting plan computeshowmany
transforms, where the input of the k-th transform is at locationin+k*idist
(in C pointer arithmetic), and its output is at locationout+k*odist
. Plans obtained in this way can often be faster than calling FFTW multiple times for the individual transforms. The basicfftw_plan_dft
interface corresponds tohowmany=1
(in which case the dist parameters are ignored).Each of the
howmany
transforms has rank rank and sizen
, as in the basic interface. In addition, the advanced interface allows the input and output arrays of each transform to be row-major subarrays of larger rank-rank arrays, described byinembed
andonembed
parameters, respectively.{i,o}nembed
must be arrays of lengthrank
, andn
should be elementwise less than or equal to{i,o}nembed
. PassingNULL
for annembed
parameter is equivalent to passingn
(i.e. same physical and logical dimensions, as in the basic interface.)The stride parameters indicate that the
j-th
element of the input or output arrays is located atj*istride
orj*ostride
, respectively. (For a multi-dimensional array,j
is the ordinary row-major index.) When combined with thek-th
transform in ahowmany
loop, from above, this means that the (j,k)-th element is atj*stride+k*dist
. (The basicfftw_plan_dft
interface corresponds to a stride of 1.)For in-place transforms, the input and output stride and dist parameters should be the same; otherwise, the planner may return
NULL
.
该页面方便地提供了一个对二维数组的列执行一维 FFT 的示例(有点令人困惑):
Transform each column of a 2d array with 10 rows and 3 columns:
int rank = 1; /* not 2: we are computing 1d transforms */ int n[] = {10}; /* 1d transforms of length 10 */ int howmany = 3; int idist = odist = 1; int istride = ostride = 3; /* distance between two elements in the same column */ int *inembed = n, *onembed = n;
有关更多示例,请参阅 How do I use fftw_plan_many_dft on a transposed array of data?。
FFTW 库在 fftw_plan_many_dft() 等函数中引入了参数 istride
或 ostride
以避免转置数组。该页面上的最后一个示例是第二个维度上的 DFT。
同样,英特尔数学内核库引入了data layout configuration parameters such as DFTI_INPUT_STRIDES
and DFTI_OUTPUT_STRIDES
或DFTI_NUMBER_OF_TRANSFORMS
。
第二个维度上的 DFT 可能看起来像(我没有测试过):
DftiCreateDescriptor(&desc_x, DFTI_PREC, DFTI_COMPLEX, 1, NJ);
DftiSetValue(desc_x, DFTI_NUMBER_OF_TRANSFORMS, NI);
DftiSetValue(desc_x, DFTI_INPUT_STRIDES, &NI);
DftiSetValue(desc_x, DFTI_OUTPUT_STRIDES, &NI);
DftiSetValue(desc_x, DFTI_INPUT_DISTANCE, 1);
DftiSetValue(desc_x, DFTI_OUTPUT_DISTANCE, 1);
DftiCommitDescriptor(desc_x);
DFTI_OUTPUT_STRIDES
被就地转换 (DFTI_PLACEMENT=DFTI_INPLACE
) 忽略。