MPI 收集字符串数组
MPI Gathering arrays of strings
我正在尝试将一组词典合并到根进程中。这是一个简短的例子:
#define MAX_CF_LENGTH 55
map<string, int> dict;
if (rank == 0)
{
dict = {
{"Accelerator Defective", 33},
{"Aggressive Driving/Road Rage", 27},
{"Alcohol Involvement", 19},
{"Animals Action", 30}};
}
if (rank == 1)
{
dict = {
{"Driver Inexperience", 6},
{"Driverless/Runaway Vehicle", 46},
{"Drugs (Illegal)", 38},
{"Failure to Keep Right", 24}};
}
if (rank == 2)
{
dict = {
{"Lost Consciousness", 1},
{"Obstruction/Debris", 8},
{"Other Electronic Device", 25},
{"Other Lighting Defects", 43},
{"Other Vehicular", 7}};
}
Scatterer scatterer(rank, MPI_COMM_WORLD, num_workers);
scatterer.gatherDictionary(dict, MAX_CF_LENGTH);
gatherDictionary()
中的想法是将每个键放在每个进程的 char
数组中(允许重复)。之后,将所有密钥收集到根中并在广播之前创建最终(合并)字典。这是代码:
void Scatterer::gatherDictionary(map<string,int> &dict, int maxKeyLength)
{
// Calculate destination dictionary size
int numKeys = dict.size();
int totalLength = numKeys * maxKeyLength;
int finalNumKeys = 0;
MPI_Reduce(&numKeys, &finalNumKeys, 1, MPI_INT, MPI_SUM, 0, comm);
// Computing number of elements that are received from each process
int *recvcounts = NULL;
if (rank == 0)
recvcounts = new int[num_workers];
MPI_Gather(&totalLength, 1, MPI_INT, recvcounts, 1, MPI_INT, 0, comm);
// Computing displacement relative to recvbuf at which to place the incoming data from each process
int *displs = NULL;
if (rank == 0)
{
displs = new int[num_workers];
displs[0] = 0;
for (int i = 1; i < num_workers; i++)
displs[i] = displs[i - 1] + recvcounts[i - 1] + 1;
}
char(*dictKeys)[maxKeyLength];
char(*finalDictKeys)[maxKeyLength];
dictKeys = (char(*)[maxKeyLength])malloc(numKeys * sizeof(*dictKeys));
if (rank == 0)
finalDictKeys = (char(*)[maxKeyLength])malloc(finalNumKeys * sizeof(*finalDictKeys));
// Collect keys for each process
int i = 0;
for (auto pair : dict)
{
strncpy(dictKeys[i], pair.first.c_str(), maxKeyLength);
i++;
}
MPI_Gatherv(dictKeys, totalLength, MPI_CHAR, finalDictKeys, recvcounts, displs, MPI_CHAR, 0, comm);
// Create new dictionary and distribute it to all processes
dict.clear();
if (rank == 0)
{
for (int i = 0; i < finalNumKeys; i++)
dict[finalDictKeys[i]] = dict.size();
}
delete[] dictKeys;
if (rank == 0)
{
delete[] finalDictKeys;
delete[] recvcounts;
delete[] displs;
}
broadcastDictionary(dict, maxKeyLength);
}
我确信 broadcastDicitonary()
正确,因为我已经测试过了。调试收集功能我得到以下部分结果:
Recvcounts:
220
220
275
Displacements:
0
221
442
FinalDictKeys:
Rank:0 Accelerator Defective
Rank:0 Aggressive Driving/Road Rage
Rank:0 Alcohol Involvement
Rank:0 Animals Action
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
由于只收集根数据,我想知道这是否与字符分配有关,即使它应该是连续的。我不认为这与末尾缺少空字符有关,因为每个 string/key 已经有很多填充。
预先感谢您指出任何遗漏或改进,如果您需要任何额外信息,请发表评论。
如果你想自己测试一下,我已经把所有的代码放在一个文件中,它已经编译&运行 就绪(当然这适用于 3 个 mpi 进程)。 Code Here
displs[i] = displs[i - 1] + recvcounts[i - 1] + 1;
最后那个+ 1
是多余的。将其更改为:
displs[i] = displs[i - 1] + recvcounts[i - 1];
我正在尝试将一组词典合并到根进程中。这是一个简短的例子:
#define MAX_CF_LENGTH 55
map<string, int> dict;
if (rank == 0)
{
dict = {
{"Accelerator Defective", 33},
{"Aggressive Driving/Road Rage", 27},
{"Alcohol Involvement", 19},
{"Animals Action", 30}};
}
if (rank == 1)
{
dict = {
{"Driver Inexperience", 6},
{"Driverless/Runaway Vehicle", 46},
{"Drugs (Illegal)", 38},
{"Failure to Keep Right", 24}};
}
if (rank == 2)
{
dict = {
{"Lost Consciousness", 1},
{"Obstruction/Debris", 8},
{"Other Electronic Device", 25},
{"Other Lighting Defects", 43},
{"Other Vehicular", 7}};
}
Scatterer scatterer(rank, MPI_COMM_WORLD, num_workers);
scatterer.gatherDictionary(dict, MAX_CF_LENGTH);
gatherDictionary()
中的想法是将每个键放在每个进程的 char
数组中(允许重复)。之后,将所有密钥收集到根中并在广播之前创建最终(合并)字典。这是代码:
void Scatterer::gatherDictionary(map<string,int> &dict, int maxKeyLength)
{
// Calculate destination dictionary size
int numKeys = dict.size();
int totalLength = numKeys * maxKeyLength;
int finalNumKeys = 0;
MPI_Reduce(&numKeys, &finalNumKeys, 1, MPI_INT, MPI_SUM, 0, comm);
// Computing number of elements that are received from each process
int *recvcounts = NULL;
if (rank == 0)
recvcounts = new int[num_workers];
MPI_Gather(&totalLength, 1, MPI_INT, recvcounts, 1, MPI_INT, 0, comm);
// Computing displacement relative to recvbuf at which to place the incoming data from each process
int *displs = NULL;
if (rank == 0)
{
displs = new int[num_workers];
displs[0] = 0;
for (int i = 1; i < num_workers; i++)
displs[i] = displs[i - 1] + recvcounts[i - 1] + 1;
}
char(*dictKeys)[maxKeyLength];
char(*finalDictKeys)[maxKeyLength];
dictKeys = (char(*)[maxKeyLength])malloc(numKeys * sizeof(*dictKeys));
if (rank == 0)
finalDictKeys = (char(*)[maxKeyLength])malloc(finalNumKeys * sizeof(*finalDictKeys));
// Collect keys for each process
int i = 0;
for (auto pair : dict)
{
strncpy(dictKeys[i], pair.first.c_str(), maxKeyLength);
i++;
}
MPI_Gatherv(dictKeys, totalLength, MPI_CHAR, finalDictKeys, recvcounts, displs, MPI_CHAR, 0, comm);
// Create new dictionary and distribute it to all processes
dict.clear();
if (rank == 0)
{
for (int i = 0; i < finalNumKeys; i++)
dict[finalDictKeys[i]] = dict.size();
}
delete[] dictKeys;
if (rank == 0)
{
delete[] finalDictKeys;
delete[] recvcounts;
delete[] displs;
}
broadcastDictionary(dict, maxKeyLength);
}
我确信 broadcastDicitonary()
正确,因为我已经测试过了。调试收集功能我得到以下部分结果:
Recvcounts:
220
220
275
Displacements:
0
221
442
FinalDictKeys:
Rank:0 Accelerator Defective
Rank:0 Aggressive Driving/Road Rage
Rank:0 Alcohol Involvement
Rank:0 Animals Action
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
由于只收集根数据,我想知道这是否与字符分配有关,即使它应该是连续的。我不认为这与末尾缺少空字符有关,因为每个 string/key 已经有很多填充。 预先感谢您指出任何遗漏或改进,如果您需要任何额外信息,请发表评论。
如果你想自己测试一下,我已经把所有的代码放在一个文件中,它已经编译&运行 就绪(当然这适用于 3 个 mpi 进程)。 Code Here
displs[i] = displs[i - 1] + recvcounts[i - 1] + 1;
最后那个+ 1
是多余的。将其更改为:
displs[i] = displs[i - 1] + recvcounts[i - 1];