gatherv 在 mpi4py 中给出转换错误

gatherv giving conversion error in mpi4py

这行代码:

 comm.Gatherv(sendbuf=[chunkToTransfer, MPI.FLOAT], \
                 recvbuf=[collectedChunk, processChunkSizes, processChunkDisplacements, MPI.FLOAT], \
                 root=writerRank)

失败并出现以下错误:

File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (src/mpi4py.MPI.c:97993) File "MPI/msgbuffer.pxi", line 516, in mpi4py.MPI._p_msg_cco.for_gather (src/mpi4py.MPI.c:34587) File "MPI/msgbuffer.pxi", line 466, in mpi4py.MPI._p_msg_cco.for_cco_recv (src/mpi4py.MPI.c:34097) File "MPI/msgbuffer.pxi", line 308, in mpi4py.MPI.message_vector (src/mpi4py.MPI.c:32485) File "MPI/asarray.pxi", line 35, in mpi4py.MPI.asarray_int (src/mpi4py.MPI.c:10927) OverflowError: value too large to convert to int

我检查了尺寸和位移,它们是正确的。这是 processChunkSizes:

array([18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 18714240, 18714240,
   18714240, 18714240, 18714240, 18714240, 19961856, 19961856,
   19961856, 19961856, 19961856, 19961856, 19961856, 19961856,
   19961856, 19961856, 19961856, 19961856, 19961856, 19961856,
   19961856, 19961856, 19961856, 19961856, 19961856, 19961856,
   19961856, 19961856, 19961856, 19961856, 19961856, 19961856])

并且 processChunkDisplacements 符合要求:

array([         0,   18714240,   37428480,   56142720,   74856960,
     93571200,  112285440,  130999680,  149713920,  168428160,
    187142400,  205856640,  224570880,  243285120,  261999360,
    280713600,  299427840,  318142080,  336856320,  355570560,
    374284800,  392999040,  411713280,  430427520,  449141760,
    467856000,  486570240,  505284480,  523998720,  542712960,
    561427200,  580141440,  598855680,  617569920,  636284160,
    654998400,  673712640,  692426880,  711141120,  729855360,
    748569600,  767283840,  785998080,  804712320,  823426560,
    842140800,  860855040,  879569280,  898283520,  916997760,
    935712000,  954426240,  973140480,  991854720, 1010568960,
   1029283200, 1047997440, 1066711680, 1085425920, 1104140160,
   1122854400, 1141568640, 1160282880, 1178997120, 1197711360,
   1216425600, 1235139840, 1253854080, 1272568320, 1291282560,
   1309996800, 1328711040, 1347425280, 1366139520, 1384853760,
   1403568000, 1422282240, 1440996480, 1459710720, 1478424960,
   1497139200, 1515853440, 1534567680, 1553281920, 1571996160,
   1590710400, 1609424640, 1628138880, 1646853120, 1665567360,
   1684281600, 1702995840, 1721710080, 1740424320, 1759138560,
   1779100416, 1799062272, 1819024128, 1838985984, 1858947840,
   1878909696, 1898871552, 1918833408, 1938795264, 1958757120,
   1978718976, 1998680832, 2018642688, 2038604544, 2058566400,
   2078528256, 2098490112, 2118451968, 2138413824, 2158375680,
   2178337536, 2198299392, 2218261248, 2238223104, 2258184960])

所有偏移量和大小都在 Python 整数范围内,那么 OverflowError 是从哪里来的?

限制不是根据 python 个整数,而是 C int 个。不幸的是,MPI 通常将 const int* 指定为位移和大小的类型。

2258184960 > 2147483647 (INT_MAX)

我不知道只使用 mpi4py 的干净通用解决方案。