分段错误不断发生在看似随机的线程数或矩阵维数相乘的情况下

Segmentation fault keeps occurring at seemingly random variants of either number of threads or dimension of matrices being multiplied

我正在尝试编写两个程序来计算两个 nxn 稠密矩阵乘积的范数。第一部分,变体 1 按预期工作。它将两个矩阵并行相乘,然后计算范数。变体 2 未按预期工作。在这里,我尝试水平划分一个矩阵并将其与另一个矩阵相乘。乘法有效,但无论出于何种原因,我在计算范数时不断收到以下错误:

a(72157,0x110d5fdc0) malloc: Incorrect checksum for freed object 0x7f8b035060e8: probably modified after being freed.
Corrupt value: 0x40ab000000000000
a(72157,0x110d5fdc0) malloc: *** set a breakpoint in malloc_error_break to debug

[2]    73373 segmentation fault  ./a 100 100

我认为这可能是没有正确释放内存的情况,但我释放了线程正在使用的内存,然后再次分配了内存,但对于看起来至少大于 5 的维度,我一直收到该错误.当我输入两个命令行参数是矩阵大小和线程数的较小矩阵时,我得到以下信息:

gcc -o a a.c -pthread && ./a 2 2
 ********** Variant 1 **********
 4.00  4.00
 4.00  4.00
 Norm : 8.00  Time Elapsed for Variant 1: 0.00

 ******************************
 ********** Variant 2 **********
 4.00  4.00
 4.00  4.00
Norm : 8.00 %


********** Variant 1 **********
 8.00  8.00  8.00  8.00
 8.00  8.00  8.00  8.00
 8.00  8.00  8.00  8.00
 8.00  8.00  8.00  8.00
 Norm : 32.00  Time Elapsed for Variant 1: 0.00

 ******************************
 ********** Variant 2 **********
 8.00  8.00  8.00  8.00
 8.00  8.00  8.00  8.00
 8.00  8.00  8.00  8.00
 8.00  8.00  8.00  8.00
Norm : 32.00 %

我真的不知道是什么导致了这个问题。这可能是因为我正在重用代码来计算范数,但这对我来说没有意义,因为它只是一个函数。

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>

// Declaring : Matrix Dimension, Number Of Threads, Norm, Matrices, Mutex
int n, num_threads;
double norm;
double * x, * y, * z;
pthread_mutex_t lock;

void * computeNorm(void *arg){

        pthread_mutex_lock(&lock);

        int tid = *(int*)(arg);

        double sum = 0.0;

        for(int j = 0; j < n; j++){
                sum += z[j * n + tid];
        }

        if(norm < sum){
                norm = sum;
        }

        pthread_mutex_unlock(&lock);

        pthread_exit(NULL);

}

void * Parallel_Matrix_Multiplication(void *arg){

  pthread_mutex_lock(&lock);

  int tid = *(int *)(arg);
  int partition = n / num_threads;
  int row_start = tid * partition;
  int row_end = (tid+1) * partition;

  for (int i = row_start; i < row_end; ++i){
    for (int j = 0; j < n; ++j){
      for (int k = 0; k < n; ++k) {
    z[i * n + j] += x[ i * n + k ] * y[ k * n + j ];
      }
    }
  }

  pthread_mutex_unlock(&lock);

  pthread_exit(NULL);
}

void * Variant_2(void *arg){
        pthread_mutex_lock(&lock);
        int tid = *(int*)(arg);
        int partition = n / num_threads;
        int row_start = tid * partition * n;
        int row_end = ((tid + 1) * partition) * n;

        for(int i = row_start; i < row_end; i++){
                for(int j = 0; j < n; j++){
                        for(int k = 0; k < n; k++){
                                z[i * n + j] += x[i * n + k] * y[k * n + j];
                        }
                }
        }
        pthread_mutex_unlock(&lock);
        pthread_exit(NULL);

}

int main(int argc, char ** argv){
  int i;
  pthread_t * threads;
  pthread_t * norm_threads;

  pthread_mutex_init(&lock, NULL);

  n = atoi( argv[1] );
  num_threads = atoi( argv[2] );

  threads = (pthread_t *)malloc(num_threads * sizeof(pthread_t));


  x = malloc(n * n * sizeof(double));
  y = malloc(n * n * sizeof(double));
  z = malloc(n * n * sizeof(double));

  for(int i = 0; i < n * n; i++){
          x[i] = 1.0;
          y[i] = 2.0;
          z[i] = 0.0;
  }

  printf(" ********** Variant 1 ********** \n");
  for ( i = 0; i < num_threads; ++i ) {
    int *tid;
    tid = (int *) malloc( sizeof(int) );
    *tid = i;
    pthread_create( &threads[i], NULL, Parallel_Matrix_Multiplication, (void *)tid );
  }

  for ( i = 0; i < num_threads; ++i ) {
    pthread_join( threads[i], NULL );
  }

  for(int i = 0; i < n; i++){
        for(int j = 0; j < n; j++){
                    printf(" %0.2f ", z[i * n + j]);
            }
            printf("\n");
    }

  norm_threads = (pthread_t*)malloc(num_threads * sizeof(pthread_t));
  for(int i = 0; i < num_threads; i++){
        int *tid;
        tid = (int*)malloc(sizeof(int));
        *tid = i;
        pthread_create(&norm_threads[i], NULL, computeNorm, (void*)tid);
  }

  for(int i = 0; i < num_threads; i++){
          pthread_join(norm_threads[i], NULL);
  }


  printf(" Norm : %0.2f ", norm);

  printf("\n ******************************");

  norm = 0.0;
  for(int i = 0; i < n * n; i++){ z[i] = 0.0; }
 
  free(threads);
  free(norm_threads);
  threads= (pthread_t*)malloc(num_threads * sizeof(pthread_t));
  norm_threads = (pthread_t*)malloc(num_threads * sizeof(pthread_t));

  printf("\n ********** Variant 2 ********** \n");

  for(i = 0; i < num_threads; i++){
          int *tid;
          tid = (int*)malloc(sizeof(int));
          *tid = i;
          pthread_create(&threads[i], NULL, Variant_2, (void*)tid);
  }

  for(i =0; i < num_threads; i++){
          pthread_join(threads[i], NULL);
  }

  for(i = 0; i < n; i++){
          for(int j = 0; j < n; j++){
                  printf(" %0.2f ", z[i * n + j]);
          }
          printf("\n");
  }


  for(i = 0; i < num_threads; i++){
          int *tid;
          tid = (int*)malloc(sizeof(int));
          *tid = i;
          pthread_create(&norm_threads[i], NULL, computeNorm, (void*)tid);
  }

  for(i = 0; i < num_threads; i++){
          pthread_join(norm_threads[i], NULL);
  }

  printf("Norm : %0.2f ", norm);
 
  pthread_mutex_destroy(&lock);



  return 0;
}

VALGRIND 输出 - GCC

==66355== Thread 3:
==66355== Invalid read of size 8
==66355==    at 0x1000015D7: Variant_2 (in ./a)
==66355==    by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355==    by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355==  Address 0x100882ac0 is 0 bytes after a block of size 80,000 alloc'd
==66355==    at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x100001B4C: main (in ./a)
==66355==
==66355== Invalid read of size 8
==66355==    at 0x100001600: Variant_2 (in ./a)
==66355==    by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355==    by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355==  Address 0x10084a350 is 0 bytes after a block of size 80,000 alloc'd
==66355==    at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x100001738: main (in ./a)
==66355==
==66355== Invalid write of size 8
==66355==    at 0x100001605: Variant_2 (in ./a)
==66355==    by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355==    by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355==  Address 0x10084a350 is 0 bytes after a block of size 80,000 alloc'd
==66355==    at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x100001738: main (in ./a)
==66355==
==66355==
==66355== Process terminating with default action of signal 11 (SIGSEGV)
==66355==  Access not within mapped region at address 0x100DC6540
==66355==    at 0x1000015D7: Variant_2 (in ./a)
==66355==    by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355==    by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355==  If you believe this happened as a result of a stack
==66355==  overflow in your program's main thread (unlikely but
==66355==  possible), you can try to increase the size of the
==66355==  main thread stack using the --main-stacksize= flag.
==66355==  The main thread stack size used in this run was 8388608.
--66355:0:schedule VG_(sema_down): read returned -4
--66355:0:schedule VG_(sema_down): read returned -4
==66355==
==66355== HEAP SUMMARY:
==66355==     in use at exit: 282,462 bytes in 494 blocks
==66355==   total heap usage: 538 allocs, 44 frees, 552,158 bytes allocated
==66355==
==66355== Thread 1:
==66355== 32 bytes in 1 blocks are possibly lost in loss record 20 of 75
==66355==    at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x1006E65F3: objc::DenseMap<objc_class*, objc_class*, objc::DenseMapValueInfo<objc_class*>, objc::DenseMapInfo<objc_class*>, objc::detail::DenseMapPair<objc_class*, objc_class*> >::grow(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x1006E64BA: addRemappedClass(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x1006D6C43: allocateBuckets(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x1006D6398: lookUpImpOrForward (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x100672F99: _xpc_payload_alloc (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x100672E65: _xpc_payload_create_from_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x100672D22: xpc_receive_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x10068CB8A: _xpc_pipe_routine (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x100671B61: xpc_pipe_routine_with_flags (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x1006719E1: _xpc_interface_routine (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x100673CE7: bootstrap_look_up3 (in /usr/lib/system/libxpc.dylib)
==66355==
==66355== 32 bytes in 1 blocks are possibly lost in loss record 21 of 75
==66355==    at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x1006E65F3: objc::DenseMap<objc_class*, objc_class*, objc::DenseMapValueInfo<objc_class*>, objc::DenseMapInfo<objc_class*>, objc::detail::DenseMapPair<objc_class*, objc_class*> >::grow(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x1006E64BA: addRemappedClass(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x1006E5FF3: realizeClassWithoutSwift(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x1006D6D44: -[NSObject dealloc] (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x1006D6398: lookUpImpOrForward (in /usr/lib/libobjc.A.dylib)
==66355==    by 0x100672F99: _xpc_payload_alloc (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x100672E65: _xpc_payload_create_from_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x100672D22: xpc_receive_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x10068CB8A: _xpc_pipe_routine (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x100671B61: xpc_pipe_routine_with_flags (in /usr/lib/system/libxpc.dylib)
==66355==    by 0x1006719E1: _xpc_interface_routine (in /usr/lib/system/libxpc.dylib)
==66355==
==66355== 56 bytes in 1 blocks are possibly lost in loss record 26 of 75
==66355==    at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x10058A190: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355==    by 0x10058A3A0: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355==    by 0x100584A6B: notify_register_check (in /usr/lib/system/libsystem_notify.dylib)
==66355==    by 0x1003BC9ED: notify_register_tz (in /usr/lib/system/libsystem_c.dylib)
==66355==    by 0x1003BC35F: tzsetwall_basic (in /usr/lib/system/libsystem_c.dylib)
==66355==    by 0x1003BE130: localtime (in /usr/lib/system/libsystem_c.dylib)
==66355==    by 0x10037C923: gettimeofday (in /usr/lib/system/libsystem_c.dylib)
==66355==    by 0x1000017D0: main (in ./a)
==66355==
==66355== 112 bytes in 1 blocks are possibly lost in loss record 44 of 75
==66355==    at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x10058A3CC: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355==    by 0x100584A6B: notify_register_check (in /usr/lib/system/libsystem_notify.dylib)
==66355==    by 0x1003BC9ED: notify_register_tz (in /usr/lib/system/libsystem_c.dylib)
==66355==    by 0x1003BC35F: tzsetwall_basic (in /usr/lib/system/libsystem_c.dylib)
==66355==    by 0x1003BE130: localtime (in /usr/lib/system/libsystem_c.dylib)
==66355==    by 0x10037C923: gettimeofday (in /usr/lib/system/libsystem_c.dylib)
==66355==    by 0x1000017D0: main (in ./a)
==66355==
==66355== 396 bytes in 99 blocks are definitely lost in loss record 58 of 75
==66355==    at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x100001807: main (in ./a)
==66355==
==66355== 396 bytes in 99 blocks are definitely lost in loss record 59 of 75
==66355==    at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x10000195E: main (in ./a)
==66355==
==66355== 800 bytes in 1 blocks are definitely lost in loss record 63 of 75
==66355==    at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x1000016CF: main (in ./a)
==66355==
==66355== 800 bytes in 1 blocks are definitely lost in loss record 64 of 75
==66355==    at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355==    by 0x100001937: main (in ./a)
==66355==
==66355== LEAK SUMMARY:
==66355==    definitely lost: 2,392 bytes in 200 blocks
==66355==    indirectly lost: 0 bytes in 0 blocks
==66355==      possibly lost: 232 bytes in 4 blocks
==66355==    still reachable: 261,764 bytes in 127 blocks
==66355==                       of which reachable via heuristic:
==66355==                         newarray           : 56 bytes in 1 blocks
==66355==         suppressed: 18,074 bytes in 163 blocks
==66355== Reachable blocks (those to which a pointer was found) are not shown.
==66355== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==66355==
==66355== For lists of detected and suppressed errors, rerun with: -s
==66355== ERROR SUMMARY: 1542409 errors from 11 contexts (suppressed: 13 from 13)
[2]    66355 segmentation fault  valgrind --leak-check=yes ./a 100 100

Valgrind 输出告诉您,您的 Variant_2() 函数超出了分配内存的范围。这样做是因为这些边界计算是错误的:

        int row_start = tid * partition * n;
        int row_end = ((tid + 1) * partition) * n;

此时,partition 变量包含分区中计算的行数,n 是总行数(和列数),tid 是整数线程索引。要在连续块中的线程之间划分行,每个线程应从行 tid * partition 开始,并在下一个线程的第一行 (tid + 1) * partition 之前结束。 n 的附加因子不适用于 row-number 计算。

在一维工作数组中计算元素偏移量时,您确实需要 n 因子,但您已经提供了它:

                                z[i * n + j] += x[i * n + k] * y[k * n + j];

另请注意,您对行进行分区的方法仅在线程数平均分配行数时才有效。否则,最后一行或多行将不会分配给任何线程。