pthreads 问题,不确定错误在哪里

Issue with pthreads, unsure where error is

(为问题的完整性添加了正确的代码)我编写了一个程序来查找从图中所有点(作为矩阵输入)到图中所有其他点的 Floyd-Warshall 最短路径矩阵。代码贴在下面。我认为我的问题与 pthreads / LowestTerm 函数有关。对于小矩阵,该程序运行良好。但是,对于大型矩阵和大量线程(8ish 是大的),我收到一个分段错误错误,没有提供其他信息。编译显示没有问题。有人看到代码的编写方式有什么不妥之处吗?会不会是所有线程都在同时尝试访问矩阵,即使它们专用于矩阵的特定行?感谢您的帮助和建议。

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>

int n, **C, **D, printthread;                   /* Variable declarations */
pthread_t *threads;
pthread_cond_t condprint;
pthread_mutex_t mutexprint;
long thread, threadcount;

printthread = 0;

void *LowestTerm(void* rank);

int main(int argc, char *argv[]) {

    int i, j;                       /* Variable declarations */
    char filename[50];

    threadcount = atoi(argv[1]);
    threads = malloc (threadcount * sizeof(pthread_t));

    printf("Enter filename: ");             /* User enters filename for directed graph values */
    scanf("%s", filename);

    FILE *fp = fopen(filename, "r");

    if (fp == NULL) {                   /* Check whether file exists or not */
        printf("File does not exist");
        return 1;
    }

    fscanf(fp, "%d", &n);                   /* Obtain size of matrix */

    C = (int **)malloc(n * sizeof(int *));          /* Allocate memory for matrix arrays */
    D = (int **)malloc(n * sizeof(int *));

    for (i = 0; i < n; i++) {               /* Allocate matrices into 2D arrays */
        C[i] = (int *)malloc(n * sizeof(int));
        D[i] = (int *)malloc(n * sizeof(int));
    }


    for (i = 0; i < n; i++) {               /* Read matrix from file into C array */
        for (j = 0; j < n; j++) {
            fscanf(fp, "%d", &C[i][j]);
        }
    }

    printf("Cost Adjacency Matrix:\n");         /* Print cost adjacency matrix */
    for (i = 0; i < n; i++) {
        for (j = 0; j < n; j++) {
            printf("%d ", C[i][j]);
        }
        printf(" \n");
    }

    for (i = 0; i < n; i++) {               /* Copy matrix from C array into D array */
        for (j = 0; j < n; j++) {
            D[i][j] = C[i][j];
        }
    }

    printf("Distance matrix:\n");               /* Print Distance matrix label */



    for (thread = 0; thread < threadcount; thread++) {  /* Create threads for making and printing distance matrix */
        pthread_create(&threads[thread], NULL, LowestTerm, (void*) thread);
    }
    for (thread = 0; thread < threadcount; thread++) {  /* Join threads back together */
        pthread_join(threads[thread], NULL);
    }

    pthread_cond_destroy (&condprint);
    pthread_mutex_destroy (&mutexprint);
    free(threads);
    pthread_exit(NULL);

}


void *LowestTerm(void* rank) {

    int i, j, k;                        /* Variable declarations */
    long mythread = (long) rank;

    int istart = ((int)mythread * n) / (int)threadcount;    /* Create matrix row start and finish parameters for each thread */
    int ifinish = ((((int)mythread + 1) * n) / (int)threadcount);

    for (k = 0; k < n; k++) {               /* Find shortest path for each value in each row for each of designated thread's rows */
        for (i = istart; i < ifinish; i++) {
            for (j = 0; j < n; j++) {
                if (D[i][j] > D[i][k] + D[k][j]) {
                    D[i][j] = D[i][k] + D[k][j];
                }
            }
        }
    }

    pthread_mutex_lock (&mutexprint);           /* Print distance matrix portion for each thread */
    while (printthread != mythread) {
        pthread_cond_wait (&condprint, &mutexprint);
    }
    for (i = istart; i < ifinish; i++) {
        printf("Thread %d: ", mythread);
        for (j = 0; j < n; j++) {
            printf("%d ", D[i][j]);
        }
        printf(" \n");
    }
    printthread++;
    pthread_cond_broadcast (&condprint);
    pthread_mutex_unlock (&mutexprint);


    return NULL;
}

jk 都从 0 循环到 n,因此项 D[k][j] 可以是矩阵中的任何元素。

因此,尽管您只写入特定行(由 i 索引),但每个线程都在读取矩阵的每个部分,而其他线程正在修改矩阵。

问题来自这一行:

int i, j, k, n, totaln, **C, **D;

由于 i、j 和 k 计数器是在全局范围内声明的,它们由所有线程共享。

如果一个线程在内部循环中进行上下文切换,则另一个线程可以将这些计数器之一递增到数组末尾。当原始线程唤醒时,它会尝试读取数组末尾的内容,这是未定义的行为,可能会导致段错误。

您可以通过将计数器变量的范围限制在 LowestTerm 函数来解决这个问题。事实上,唯一需要在全局范围内定义的变量是**D、n 和 threadcount; n 和 threadcount 并不真的需要共享,它们可以作为参数传递给 LowestTerm。