二维袖带变换
2D cufft transform
我关于 Whosebug 的第一个问题。
我是 cuda 新手。
我只想执行二维复数到复数 FFT。
我的输入数据已处理,不需要填充。
我就是得不到预期的结果。这是我的代码:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <cuda_runtime.h>
#include <cufft.h>
typedef float2 Complex;
#define M2 512 // number of rows
#define N2 2048 // number of columns
int main()
{
int i, j;
FILE *fp;
char *fmt = "%16e";
// Allocate memory for h_input and h_output on host
// And make sure they are continuous
Complex **h_input, **h_output;
h_input = (Complex **)malloc(M2*sizeof(Complex *));
h_output= (Complex **)malloc(M2*sizeof(Complex *));
h_input[0] = (Complex *)malloc(M2*N2*sizeof(Complex));
h_output[0]= (Complex *)malloc(M2*N2*sizeof(Complex));
for (i = 1; i < M2; i++){
h_input[i] = h_input[i - 1] + N2;
h_output[i]= h_output[i - 1] + N2;
}
// Load h_input from a file
if ((fp = fopen("INFLU_ORIGIN.DAT", "rt")) == NULL){
printf("\nCannot open file strike any key exit!");
}
for (i = 0; i <= M2 - 1; i++){
for (j = 0; j <= N2 - 1; j++){
fscanf(fp, fmt, &h_input[i][j].x);
h_input[i][j].y = 0.0;
}
fscanf(fp, "%\n");
}
fclose(fp);
// allocate memory on device and copy h_input into d_array
Complex *d_array;
size_t host_orig_pitch = N2 * sizeof(Complex);
size_t pitch;
cudaMallocPitch(&d_array, &pitch, N2 * sizeof(Complex), M2);
cudaMemcpy2D(d_array, pitch, h_input[0], host_orig_pitch,
N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);
// Copy d_array back to host, and write it to a file
// to check if they are as correctly copied into device
cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch,
N2* sizeof(Complex), M2, cudaMemcpyDeviceToHost);
if ((fp = fopen("INFLU_FFT_GET.DAT", "wt")) == NULL){
printf("\nCannot create file strike any key exit!");
}
for (i = 0; i <= M2 - 1; i++){
for (j = 0; j <= N2 - 1; j++){
fprintf(fp, fmt, h_output[i][j].x);
}
fprintf(fp, "%\n");
}
fclose(fp);
// create CUFFT plan
cufftHandle plan;
cufftResult filter_result;
filter_result = cufftPlan2d(&plan, M2, N2, CUFFT_C2C);
if (filter_result != CUFFT_SUCCESS){
printf("\n failed to create plan \n");
}
else{
printf("\n succeeded in creating plan \n");
}
// perform forward FFT on d_array
printf("\nTransforming influence coefficient cufftExecC2C\n");
filter_result = cufftExecC2C(plan, (cufftComplex *)d_array,
(cufftComplex *)d_array, CUFFT_FORWARD);
if (filter_result != CUFFT_SUCCESS){
printf("\ntransform failed\n");
}
else{
printf("\ntranform succeed\n");
}
// Copy the fft result to host, write it to a file
// to observe the result of FFT
cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch,
N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);
if ((fp = fopen("INFLU_FFT_C.DAT", "wt")) == NULL){
printf("\nCannot create file strike any key exit!");
}
for (i = 0; i <= M2-1; i++){
for (j = 0; j <= N2-1; j++){
fprintf(fp, fmt, h_output[i][j].x);
}
fprintf(fp, "%\n");
}
fclose(fp);
cufftDestroy(plan);
free(h_input[0]);
free(h_input);
free(h_output[0]);
free(h_output);
cudaFree(d_array);
cudaDeviceReset();
}
这段代码的工作流程如下:
(1) 在主机
上分配 h_input 和 h_output
(2) 从文件加载数据到h_input -- "INFLU.DAT"
(3) 在设备上分配d_array,并将h_input复制到其中
(4)将d_array复制回h_output,写入文件--"INFLU_GET.DAT"
---- 查看d_array是否接收到正确的数据
(5) 对d_array
进行前向复数到复数的FFT
(6)将d_array复制回h_output,写入文件--"INFLU_FFT.DAT"
----观察FFT
的结果
通过执行步骤 (4),我确定 h_input 到 d_array 的副本是正确的。
我的问题是:
在步骤(6)中,我发现经过FFT后,d_array和h_output仍然和输入的一样
输入文件是:
https://drive.google.com/file/d/0B88U83cfBwMmdGFtbGJ2MVlURDg/view?usp=sharing
文件名为 INFLU.DAT,大小为 16MB。
我有一个用于比较的结果文件(在 Fortran 中):
https://drive.google.com/file/d/0B88U83cfBwMmcDR1YzYyRzF4Mjg/view?usp=sharing
文件名是 INFLU_FFT_F.DAT,大小也是 16MB。
欢迎任何建议!谢谢!
问题可能出在上一个cudaMemcpy()
:
cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch,
N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);
它将数据从主机复制到设备,我的猜测是您正在尝试从设备复制到主机,就像上面几行所做的那样:
cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch,
N2* sizeof(Complex), M2, cudaMemcpyDeviceToHost);
我关于 Whosebug 的第一个问题。
我是 cuda 新手。
我只想执行二维复数到复数 FFT。
我的输入数据已处理,不需要填充。
我就是得不到预期的结果。这是我的代码:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <cuda_runtime.h>
#include <cufft.h>
typedef float2 Complex;
#define M2 512 // number of rows
#define N2 2048 // number of columns
int main()
{
int i, j;
FILE *fp;
char *fmt = "%16e";
// Allocate memory for h_input and h_output on host
// And make sure they are continuous
Complex **h_input, **h_output;
h_input = (Complex **)malloc(M2*sizeof(Complex *));
h_output= (Complex **)malloc(M2*sizeof(Complex *));
h_input[0] = (Complex *)malloc(M2*N2*sizeof(Complex));
h_output[0]= (Complex *)malloc(M2*N2*sizeof(Complex));
for (i = 1; i < M2; i++){
h_input[i] = h_input[i - 1] + N2;
h_output[i]= h_output[i - 1] + N2;
}
// Load h_input from a file
if ((fp = fopen("INFLU_ORIGIN.DAT", "rt")) == NULL){
printf("\nCannot open file strike any key exit!");
}
for (i = 0; i <= M2 - 1; i++){
for (j = 0; j <= N2 - 1; j++){
fscanf(fp, fmt, &h_input[i][j].x);
h_input[i][j].y = 0.0;
}
fscanf(fp, "%\n");
}
fclose(fp);
// allocate memory on device and copy h_input into d_array
Complex *d_array;
size_t host_orig_pitch = N2 * sizeof(Complex);
size_t pitch;
cudaMallocPitch(&d_array, &pitch, N2 * sizeof(Complex), M2);
cudaMemcpy2D(d_array, pitch, h_input[0], host_orig_pitch,
N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);
// Copy d_array back to host, and write it to a file
// to check if they are as correctly copied into device
cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch,
N2* sizeof(Complex), M2, cudaMemcpyDeviceToHost);
if ((fp = fopen("INFLU_FFT_GET.DAT", "wt")) == NULL){
printf("\nCannot create file strike any key exit!");
}
for (i = 0; i <= M2 - 1; i++){
for (j = 0; j <= N2 - 1; j++){
fprintf(fp, fmt, h_output[i][j].x);
}
fprintf(fp, "%\n");
}
fclose(fp);
// create CUFFT plan
cufftHandle plan;
cufftResult filter_result;
filter_result = cufftPlan2d(&plan, M2, N2, CUFFT_C2C);
if (filter_result != CUFFT_SUCCESS){
printf("\n failed to create plan \n");
}
else{
printf("\n succeeded in creating plan \n");
}
// perform forward FFT on d_array
printf("\nTransforming influence coefficient cufftExecC2C\n");
filter_result = cufftExecC2C(plan, (cufftComplex *)d_array,
(cufftComplex *)d_array, CUFFT_FORWARD);
if (filter_result != CUFFT_SUCCESS){
printf("\ntransform failed\n");
}
else{
printf("\ntranform succeed\n");
}
// Copy the fft result to host, write it to a file
// to observe the result of FFT
cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch,
N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);
if ((fp = fopen("INFLU_FFT_C.DAT", "wt")) == NULL){
printf("\nCannot create file strike any key exit!");
}
for (i = 0; i <= M2-1; i++){
for (j = 0; j <= N2-1; j++){
fprintf(fp, fmt, h_output[i][j].x);
}
fprintf(fp, "%\n");
}
fclose(fp);
cufftDestroy(plan);
free(h_input[0]);
free(h_input);
free(h_output[0]);
free(h_output);
cudaFree(d_array);
cudaDeviceReset();
}
这段代码的工作流程如下:
(1) 在主机
上分配 h_input 和 h_output
(2) 从文件加载数据到h_input -- "INFLU.DAT"
(3) 在设备上分配d_array,并将h_input复制到其中
(4)将d_array复制回h_output,写入文件--"INFLU_GET.DAT"
---- 查看d_array是否接收到正确的数据
(5) 对d_array
进行前向复数到复数的FFT
(6)将d_array复制回h_output,写入文件--"INFLU_FFT.DAT"
----观察FFT
通过执行步骤 (4),我确定 h_input 到 d_array 的副本是正确的。
我的问题是:
在步骤(6)中,我发现经过FFT后,d_array和h_output仍然和输入的一样
输入文件是:
https://drive.google.com/file/d/0B88U83cfBwMmdGFtbGJ2MVlURDg/view?usp=sharing
文件名为 INFLU.DAT,大小为 16MB。
我有一个用于比较的结果文件(在 Fortran 中):
https://drive.google.com/file/d/0B88U83cfBwMmcDR1YzYyRzF4Mjg/view?usp=sharing
文件名是 INFLU_FFT_F.DAT,大小也是 16MB。
欢迎任何建议!谢谢!
问题可能出在上一个cudaMemcpy()
:
cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch,
N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);
它将数据从主机复制到设备,我的猜测是您正在尝试从设备复制到主机,就像上面几行所做的那样:
cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch,
N2* sizeof(Complex), M2, cudaMemcpyDeviceToHost);