为什么使用MPJ Express的程序会出现异常?

Why does an exception occurs in the program which used MPJ Express?

有一个程序使用 MPJ Express 将矩阵和向量相乘。矩阵按行划分。但是在处理的时候出现了异常。那我做错了吗?

import java.util.Random;

import mpi.Comm;
import mpi.MPI;

public class Main {
    private static final int rootProcessorRank = 0;
    private static Comm comunicator;
    private static int processorsNumber;
    private static int currentProcessorRank;


    public static void main(String[] args) {
        MPI.Init(args);
        comunicator = MPI.COMM_WORLD;
        currentProcessorRank = comunicator.Rank();
        processorsNumber = comunicator.Size();

        if (currentProcessorRank == rootProcessorRank) {
            rootProcessorAction();
        } else {
            notRootProcessorAction();
        }

        MPI.Finalize();

    }

    public static void rootProcessorAction() {
        int[] matrixVectorSize = new int[] {5};
        int[][] matrix = createAndInitMatrix(matrixVectorSize[0]);
        int[] vector = createAndInitVector(matrixVectorSize[0]);

        for (int i = 1; i < processorsNumber; i++) {
            comunicator.Isend(matrixVectorSize, 0, 1, MPI.INT, i, MPI.ANY_TAG);
            System.out.println("Proc: " + currentProcessorRank + ", send matrixVectorSize");

            comunicator.Isend(vector, 0, vector.length, MPI.INT, i, MPI.ANY_TAG);
            System.out.println("Proc: " + currentProcessorRank + ", send vector");
        }

        int averageRowsPerProcessor = matrix.length / (processorsNumber - 1);
        int[] rowsPerProcessor = new int[processorsNumber];
        int notDistributedRowsNumber = matrix.length;
        for (int i = 1; i < rowsPerProcessor.length; i++) {
            if (i == rowsPerProcessor.length - 1) {
                rowsPerProcessor[i] = notDistributedRowsNumber;
            } else {
                rowsPerProcessor[i] = averageRowsPerProcessor;
                notDistributedRowsNumber -= averageRowsPerProcessor;
            }
        }

        int offset = 0;
        // the processorRows[0] always will be '0'
        for (int i = 1; i < rowsPerProcessor.length; i++) {
            int[] processorRows = new int[1];
            processorRows[0] = rowsPerProcessor[i];
            comunicator.Isend(processorRows, 0, 1, MPI.INT, i, MPI.ANY_TAG);
            comunicator.Isend(matrix, offset, processorRows[0], MPI.OBJECT, i, MPI.ANY_TAG);
            offset += rowsPerProcessor[i];
        }

        // there will be a code that receive a subRecults from all processes.
    }

    public static void notRootProcessorAction() {
        int[] matrixVectorSize = new int[1];
        int[] rowsNumber = new int[1];
        int[] vector = null;
        int[][] subMatrix = null;

        comunicator.Probe(rootProcessorRank, MPI.ANY_SOURCE);
        comunicator.Recv(matrixVectorSize, 0, 1, MPI.INT, rootProcessorRank, MPI.ANY_TAG);
        System.out.println("Proc: " + currentProcessorRank + ", receive matrixVectorSize");

        vector = new int[matrixVectorSize[0]];
        comunicator.Probe(rootProcessorRank, MPI.ANY_SOURCE);
        comunicator.Recv(vector, 0, vector.length, MPI.INT, rootProcessorRank, MPI.ANY_TAG);
        System.out.println("Proc: " + currentProcessorRank + ", receive vector");

        comunicator.Probe(rootProcessorRank, MPI.ANY_SOURCE);
        comunicator.Recv(rowsNumber, 0, 1, MPI.INT, rootProcessorRank, MPI.ANY_TAG);
        System.out.println("Proc: " + currentProcessorRank + ", receive rowsNumber");
        subMatrix = new int[rowsNumber[0]][rowsNumber[0]];

        comunicator.Probe(rootProcessorRank, MPI.ANY_SOURCE);
        comunicator.Recv(subMatrix, 0, subMatrix.length, MPI.OBJECT, rootProcessorRank, MPI.ANY_TAG);
        System.out.println("Proc: " + currentProcessorRank + ", receive subMatrix");

        int[] result = new int[rowsNumber[0]];
        multiplyMatrixVector(subMatrix, vector, result);

        comunicator.Send(result, 0, result.length, MPI.INT, rootProcessorRank, MPI.ANY_TAG);
    }

    private static void multiplyMatrixVector(int[][] matrix, int[] vector, int[] result) {
        for (int i = 0; i < matrix.length; i++) {
            int summ = 0;
            for (int j = 0; j < matrix[i].length; j++) {
                summ += matrix[i][j] * vector[j];
            }
            result[i] = summ;
        }
    }

    private static int[][] createAndInitMatrix(int size) {
        int[][] matrix = new int[size][size];
        Random random = new Random();
        for (int i = 0; i < matrix.length; i++) {
            for (int j = 0; j < matrix.length; j++) {
                matrix[i][j] = random.nextInt(100);
            }
        }
        return matrix;
    }

    private static int[] createAndInitVector(int size) {
        int[] vector = new int[size];
        Random random = new Random();
        for (int i = 0; i < vector.length; i++) {
            vector[i] = random.nextInt(100);
        }
        return vector;
    }
}

这里有一个例外:

MPJ Express (0.44) is started in the multicore configuration java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at runtime.starter.MulticoreStarter.run(MulticoreStarter.java:281) at java.lang.Thread.run(Thread.java:745) Caused by: mpi.MPIException: xdev.XDevException: java.lang.NullPointerException at mpi.Comm.isend(Comm.java:944) at mpi.Comm.Isend(Comm.java:885) at Main.rootProcessorAction(Main.java:35) at Main.main(Main.java:20) ... 6 more Caused by: xdev.XDevException: java.lang.NullPointerException at xdev.smpdev.SMPDevice.isend(SMPDevice.java:104) at mpjdev.javampjdev.Comm.isend(Comm.java:1019) at mpi.Comm.isend(Comm.java:941) ... 9 more Caused by: java.lang.NullPointerException at xdev.smpdev.SMPDeviceImpl$SendQueue.add(SMPDeviceImpl.java:930) at xdev.smpdev.SMPDeviceImpl$SendQueue.add(SMPDeviceImpl.java:909) at xdev.smpdev.SMPDeviceImpl.isend(SMPDeviceImpl.java:330) at xdev.smpdev.SMPDevice.isend(SMPDevice.java:101) ... 11 more xdev.XDevException: java.lang.NullPointerException at xdev.smpdev.SMPDevice.recv(SMPDevice.java:162)

根据我使用 mpj express 的经验,尽量避免使用常量 MPI.ANY_SOURCE 和 MPI.ANY_TAG。设置你自己的标签和来源,你应该没问题。当我在我的程序中使用这个常量时,有时我会随机崩溃 xDev.xDevException 由空指针引起,有时它 运行 就好了。

这里列出了 mpj express 的内部常量,您也不应将其用作标记,我只显示整数常量:

public static final int mpi.MPI.NUM_OF_PROCESSORS = 4
public static int mpi.MPI.UNDEFINED = -1
public static int mpi.MPI.THREAD_SINGLE = 1
public static int mpi.MPI.THREAD_FUNNELED = 2
public static int mpi.MPI.THREAD_SERIALIZED = 3
public static int mpi.MPI.THREAD_MULTIPLE = 4
public static int mpi.MPI.ANY_SOURCE = -2
public static int mpi.MPI.ANY_TAG = -2
public static int mpi.MPI.PROC_NULL = -3
public static int mpi.MPI.BSEND_OVERHEAD = 0
public static int mpi.MPI.SEND_OVERHEAD = 0
public static int mpi.MPI.RECV_OVERHEAD = 0
public static final int mpi.MPI.IDENT = 0
public static final int mpi.MPI.CONGRUENT = 3
public static final int mpi.MPI.SIMILAR = 1
public static final int mpi.MPI.UNEQUAL = 2
public static int mpi.MPI.GRAPH = 1
public static int mpi.MPI.CART = 2
public static int mpi.MPI.TAG_UB = 0
public static int mpi.MPI.HOST = 0
public static int mpi.MPI.IO = 0

干杯。