OpenACC vs C++: FATAL ERROR: variable is partially present on the device

OpenACC vs C++: FATAL ERROR: variable is partially present on the device

我正在尝试使用 OpenACC 将一些 C++ 应用程序移植到 GPU。尽可能 期望,C++代码有很多封装和抽象。内存是 在一些类似向量的 class 中分配,然后这个 class 在许多其他中被重用 class围绕应用程序。而且我在尝试正确时遇到了麻烦 将 OpenACC 编译指示插入代码。这是我的代码的简化示例 正在处理:

#define DATASIZE 16

class Data {
  float *arr;
public:
  Data() {arr = new float[DATASIZE];}
  ~Data() { delete [] arr; }
  float &get(int i) { return arr[i]; }
};

class DataKeeper {
  Data a, b, c;
public:
  void init() {
    for (int i = 0; i < DATASIZE; ++i)
      a.get(i) = 0.0;
  }
};

int main() {
  DataKeeper DK;
  DK.init();
}

我插入一些 OpenACC pragmas 以将必要的数据发送到设备并结束 使用这样的代码:

#define DATASIZE 16

class Data {
  float *arr;

public:
  Data() {
    arr = new float[DATASIZE];
#pragma acc enter data copyin(this)
#pragma acc enter data create(arr[:DATASIZE])
  }

  ~Data() {
#pragma acc exit data delete(arr)
#pragma acc exit data delete(this)
    delete [] arr;
  }

  float &get(int i) { return arr[i]; }
};

class DataKeeper {
  Data a, b, c;

public:
  DataKeeper() {
#pragma acc enter data copyin(this)
  }

  ~DataKeeper() {
#pragma acc exit data delete(this)
  }

  void init() {
#pragma acc parallel loop
    for (int i = 0; i < DATASIZE; ++i) {
      a.get(i) = 0.0;
    }
  }
};

int main() {
  DataKeeper DK;
  DK.init();
}

但是在编译和 运行 之后我得到以下错误:

$ pgc++ test.cc -acc -g

$ ./a.out 
_T24395416_101 lives at 0x7fff49e03070 size 24 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 3.5, threadid=1
host:0x1ae6eb0 device:0xc05ca0200 size:64 presentcount:0+1 line:11 name:(null)
host:0x1f33620 device:0xc05ca0600 size:64 presentcount:0+1 line:11 name:(null)
host:0x1f33d10 device:0xc05ca0a00 size:64 presentcount:0+1 line:11 name:(null)
host:0x7fff49e03070 device:0xc05ca0000 size:8 presentcount:0+1 line:11 name:_T24395600_98
host:0x7fff49e03078 device:0xc05ca0400 size:8 presentcount:0+1 line:11 name:_T24395600_98
host:0x7fff49e03080 device:0xc05ca0800 size:8 presentcount:0+1 line:11 name:_T24395600_98
allocated block device:0xc05ca0000 size:512 thread:1
allocated block device:0xc05ca0200 size:512 thread:1
allocated block device:0xc05ca0400 size:512 thread:1
allocated block device:0xc05ca0600 size:512 thread:1
allocated block device:0xc05ca0800 size:512 thread:1
allocated block device:0xc05ca0a00 size:512 thread:1

FATAL ERROR: variable in data clause is partially present on the device: name=_T24395416_101
 file:/home/bozhenovn/tst/test.cc _ZN10DataKeeperC1Ev line:27

我不知道代码有什么问题。对于如何修复代码的任何想法或如何进一步调查问题的建议,我将不胜感激。谢谢!

这里发生的事情是 "a" 的主机地址与 "DK" 的起始地址相同。因此,当编译器在当前 table 中查找主机地址时,它用于将变量的主机地址映射到设备地址,它发现大小不同。 "a" 是 8 码,而 "DK" 是 24 码。

我将在下面展示修复方法,但让我们回过头来了解这里发生了什么。当在主机上创建 "DK" 时,它首先为它的每个数据成员创建存储,然后调用每个数据成员的 class 构造函数。然后它执行它自己的构造函数。因此,对于每个数据成员,您的代码将在设备上创建 class this 指针,然后在设备上分配 "arr" 数组。完成此操作后,将在设备上为每个数据成员创建 "DK" 和 space。但是,由于 "DK" 的设备副本是在数据成员之后创建的,因此编译器无法自动将两者关联起来。

下面,我发布了两个可能的修复方法。

首先,您可以让 "Data" class 管理它自己的数据,但您需要动态分配 class 数据成员。这样 Data 构造函数将出现在 DataKeeper 构造函数之后,因此编译器可以关联设备数据(也称为 "attach")。

其次,您可以让 DataKeeper class 管理 Data class 的数据。但是,这将需要 Data 的数据为 public。

请注意,我写了本书 "Parallel Programming with OpenACC" 的第 5 章 "Advanced Data Management" 并包括了有关 C++ class 数据管理的部分。您可以在以下位置找到我的示例代码:https://github.com/rmfarber/ParallelProgrammingWithOpenACC/tree/master/Chapter05 特别是,看看我是如何制作通用容器的 class、"accList".

修复 #1:

#define DATASIZE 16
#include <iostream>
#ifdef _OPENACC
#include <openacc.h>
#endif

class Data {
  float *arr;

public:
  Data() {
    arr = new float[DATASIZE];
#pragma acc enter data copyin(this)
#pragma acc enter data create(arr[:DATASIZE])
  }

  ~Data() {
#pragma acc exit data delete(arr)
#pragma acc exit data delete(this)
    delete [] arr;
  }

  float &get(int i) { return arr[i]; }
  void updatehost() {
   #pragma acc update host(arr[0:DATASIZE])
  }

};

class DataKeeper {
  Data *a, *b, *c;

public:
  DataKeeper() {
#pragma acc enter data copyin(this)
  a = new Data;
  b = new Data;
  c = new Data;
  }

  ~DataKeeper() {
#pragma acc exit data delete(this)
  delete a;
  delete b;
  delete c;
  }

  void init() {
#pragma acc parallel loop present(a,b,c)
    for (int i = 0; i < DATASIZE; ++i) {
      a->get(i) = i;
    }
    a->updatehost();
    std::cout << "a.arr[0]=" << a->get(0) << std::endl;
    std::cout << "a.arr[end]=" << a->get(DATASIZE-1) << std::endl;
  }
};

int main() {
  DataKeeper DK;
  DK.init();
}

修复 #2

#define DATASIZE 16
#include <iostream>
#ifdef _OPENACC
#include <openacc.h>
#endif

class Data {
public:
  float *arr;

  Data() {
    arr = new float[DATASIZE];
  }

  ~Data() {
    delete [] arr;
  }

  float &get(int i) { return arr[i]; }
};

class DataKeeper {
  Data a, b, c;

public:
  DataKeeper() {
#pragma acc enter data copyin(this)
#pragma acc enter data create(a.arr[0:DATASIZE])
#pragma acc enter data create(b.arr[0:DATASIZE])
#pragma acc enter data create(c.arr[0:DATASIZE])
  }

  ~DataKeeper() {
#pragma acc exit data delete(this)
#pragma acc exit data delete(a.arr)
#pragma acc exit data delete(b.arr)
#pragma acc exit data delete(c.arr)
  }

  void init() {
#pragma acc parallel loop present(a,b,c)
    for (int i = 0; i < DATASIZE; ++i) {
      a.get(i) = i;
    }
#pragma acc update host(a.arr[0:DATASIZE])
    std::cout << "a.arr[0]=" << a.arr[0] << std::endl;
    std::cout << "a.arr[end]=" << a.arr[DATASIZE-1] << std::endl;
  }
};

int main() {
  DataKeeper DK;
  DK.init();
}