分段错误不断发生在看似随机的线程数或矩阵维数相乘的情况下
Segmentation fault keeps occurring at seemingly random variants of either number of threads or dimension of matrices being multiplied
我正在尝试编写两个程序来计算两个 nxn 稠密矩阵乘积的范数。第一部分,变体 1 按预期工作。它将两个矩阵并行相乘,然后计算范数。变体 2 未按预期工作。在这里,我尝试水平划分一个矩阵并将其与另一个矩阵相乘。乘法有效,但无论出于何种原因,我在计算范数时不断收到以下错误:
a(72157,0x110d5fdc0) malloc: Incorrect checksum for freed object 0x7f8b035060e8: probably modified after being freed.
Corrupt value: 0x40ab000000000000
a(72157,0x110d5fdc0) malloc: *** set a breakpoint in malloc_error_break to debug
或
[2] 73373 segmentation fault ./a 100 100
我认为这可能是没有正确释放内存的情况,但我释放了线程正在使用的内存,然后再次分配了内存,但对于看起来至少大于 5 的维度,我一直收到该错误.当我输入两个命令行参数是矩阵大小和线程数的较小矩阵时,我得到以下信息:
gcc -o a a.c -pthread && ./a 2 2
********** Variant 1 **********
4.00 4.00
4.00 4.00
Norm : 8.00 Time Elapsed for Variant 1: 0.00
******************************
********** Variant 2 **********
4.00 4.00
4.00 4.00
Norm : 8.00 %
********** Variant 1 **********
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
Norm : 32.00 Time Elapsed for Variant 1: 0.00
******************************
********** Variant 2 **********
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
Norm : 32.00 %
我真的不知道是什么导致了这个问题。这可能是因为我正在重用代码来计算范数,但这对我来说没有意义,因为它只是一个函数。
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
// Declaring : Matrix Dimension, Number Of Threads, Norm, Matrices, Mutex
int n, num_threads;
double norm;
double * x, * y, * z;
pthread_mutex_t lock;
void * computeNorm(void *arg){
pthread_mutex_lock(&lock);
int tid = *(int*)(arg);
double sum = 0.0;
for(int j = 0; j < n; j++){
sum += z[j * n + tid];
}
if(norm < sum){
norm = sum;
}
pthread_mutex_unlock(&lock);
pthread_exit(NULL);
}
void * Parallel_Matrix_Multiplication(void *arg){
pthread_mutex_lock(&lock);
int tid = *(int *)(arg);
int partition = n / num_threads;
int row_start = tid * partition;
int row_end = (tid+1) * partition;
for (int i = row_start; i < row_end; ++i){
for (int j = 0; j < n; ++j){
for (int k = 0; k < n; ++k) {
z[i * n + j] += x[ i * n + k ] * y[ k * n + j ];
}
}
}
pthread_mutex_unlock(&lock);
pthread_exit(NULL);
}
void * Variant_2(void *arg){
pthread_mutex_lock(&lock);
int tid = *(int*)(arg);
int partition = n / num_threads;
int row_start = tid * partition * n;
int row_end = ((tid + 1) * partition) * n;
for(int i = row_start; i < row_end; i++){
for(int j = 0; j < n; j++){
for(int k = 0; k < n; k++){
z[i * n + j] += x[i * n + k] * y[k * n + j];
}
}
}
pthread_mutex_unlock(&lock);
pthread_exit(NULL);
}
int main(int argc, char ** argv){
int i;
pthread_t * threads;
pthread_t * norm_threads;
pthread_mutex_init(&lock, NULL);
n = atoi( argv[1] );
num_threads = atoi( argv[2] );
threads = (pthread_t *)malloc(num_threads * sizeof(pthread_t));
x = malloc(n * n * sizeof(double));
y = malloc(n * n * sizeof(double));
z = malloc(n * n * sizeof(double));
for(int i = 0; i < n * n; i++){
x[i] = 1.0;
y[i] = 2.0;
z[i] = 0.0;
}
printf(" ********** Variant 1 ********** \n");
for ( i = 0; i < num_threads; ++i ) {
int *tid;
tid = (int *) malloc( sizeof(int) );
*tid = i;
pthread_create( &threads[i], NULL, Parallel_Matrix_Multiplication, (void *)tid );
}
for ( i = 0; i < num_threads; ++i ) {
pthread_join( threads[i], NULL );
}
for(int i = 0; i < n; i++){
for(int j = 0; j < n; j++){
printf(" %0.2f ", z[i * n + j]);
}
printf("\n");
}
norm_threads = (pthread_t*)malloc(num_threads * sizeof(pthread_t));
for(int i = 0; i < num_threads; i++){
int *tid;
tid = (int*)malloc(sizeof(int));
*tid = i;
pthread_create(&norm_threads[i], NULL, computeNorm, (void*)tid);
}
for(int i = 0; i < num_threads; i++){
pthread_join(norm_threads[i], NULL);
}
printf(" Norm : %0.2f ", norm);
printf("\n ******************************");
norm = 0.0;
for(int i = 0; i < n * n; i++){ z[i] = 0.0; }
free(threads);
free(norm_threads);
threads= (pthread_t*)malloc(num_threads * sizeof(pthread_t));
norm_threads = (pthread_t*)malloc(num_threads * sizeof(pthread_t));
printf("\n ********** Variant 2 ********** \n");
for(i = 0; i < num_threads; i++){
int *tid;
tid = (int*)malloc(sizeof(int));
*tid = i;
pthread_create(&threads[i], NULL, Variant_2, (void*)tid);
}
for(i =0; i < num_threads; i++){
pthread_join(threads[i], NULL);
}
for(i = 0; i < n; i++){
for(int j = 0; j < n; j++){
printf(" %0.2f ", z[i * n + j]);
}
printf("\n");
}
for(i = 0; i < num_threads; i++){
int *tid;
tid = (int*)malloc(sizeof(int));
*tid = i;
pthread_create(&norm_threads[i], NULL, computeNorm, (void*)tid);
}
for(i = 0; i < num_threads; i++){
pthread_join(norm_threads[i], NULL);
}
printf("Norm : %0.2f ", norm);
pthread_mutex_destroy(&lock);
return 0;
}
VALGRIND 输出 - GCC
==66355== Thread 3:
==66355== Invalid read of size 8
==66355== at 0x1000015D7: Variant_2 (in ./a)
==66355== by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== Address 0x100882ac0 is 0 bytes after a block of size 80,000 alloc'd
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001B4C: main (in ./a)
==66355==
==66355== Invalid read of size 8
==66355== at 0x100001600: Variant_2 (in ./a)
==66355== by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== Address 0x10084a350 is 0 bytes after a block of size 80,000 alloc'd
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001738: main (in ./a)
==66355==
==66355== Invalid write of size 8
==66355== at 0x100001605: Variant_2 (in ./a)
==66355== by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== Address 0x10084a350 is 0 bytes after a block of size 80,000 alloc'd
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001738: main (in ./a)
==66355==
==66355==
==66355== Process terminating with default action of signal 11 (SIGSEGV)
==66355== Access not within mapped region at address 0x100DC6540
==66355== at 0x1000015D7: Variant_2 (in ./a)
==66355== by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== If you believe this happened as a result of a stack
==66355== overflow in your program's main thread (unlikely but
==66355== possible), you can try to increase the size of the
==66355== main thread stack using the --main-stacksize= flag.
==66355== The main thread stack size used in this run was 8388608.
--66355:0:schedule VG_(sema_down): read returned -4
--66355:0:schedule VG_(sema_down): read returned -4
==66355==
==66355== HEAP SUMMARY:
==66355== in use at exit: 282,462 bytes in 494 blocks
==66355== total heap usage: 538 allocs, 44 frees, 552,158 bytes allocated
==66355==
==66355== Thread 1:
==66355== 32 bytes in 1 blocks are possibly lost in loss record 20 of 75
==66355== at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x1006E65F3: objc::DenseMap<objc_class*, objc_class*, objc::DenseMapValueInfo<objc_class*>, objc::DenseMapInfo<objc_class*>, objc::detail::DenseMapPair<objc_class*, objc_class*> >::grow(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006E64BA: addRemappedClass(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006D6C43: allocateBuckets(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006D6398: lookUpImpOrForward (in /usr/lib/libobjc.A.dylib)
==66355== by 0x100672F99: _xpc_payload_alloc (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100672E65: _xpc_payload_create_from_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100672D22: xpc_receive_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355== by 0x10068CB8A: _xpc_pipe_routine (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100671B61: xpc_pipe_routine_with_flags (in /usr/lib/system/libxpc.dylib)
==66355== by 0x1006719E1: _xpc_interface_routine (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100673CE7: bootstrap_look_up3 (in /usr/lib/system/libxpc.dylib)
==66355==
==66355== 32 bytes in 1 blocks are possibly lost in loss record 21 of 75
==66355== at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x1006E65F3: objc::DenseMap<objc_class*, objc_class*, objc::DenseMapValueInfo<objc_class*>, objc::DenseMapInfo<objc_class*>, objc::detail::DenseMapPair<objc_class*, objc_class*> >::grow(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006E64BA: addRemappedClass(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006E5FF3: realizeClassWithoutSwift(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006D6D44: -[NSObject dealloc] (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006D6398: lookUpImpOrForward (in /usr/lib/libobjc.A.dylib)
==66355== by 0x100672F99: _xpc_payload_alloc (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100672E65: _xpc_payload_create_from_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100672D22: xpc_receive_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355== by 0x10068CB8A: _xpc_pipe_routine (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100671B61: xpc_pipe_routine_with_flags (in /usr/lib/system/libxpc.dylib)
==66355== by 0x1006719E1: _xpc_interface_routine (in /usr/lib/system/libxpc.dylib)
==66355==
==66355== 56 bytes in 1 blocks are possibly lost in loss record 26 of 75
==66355== at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x10058A190: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x10058A3A0: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x100584A6B: notify_register_check (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x1003BC9ED: notify_register_tz (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1003BC35F: tzsetwall_basic (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1003BE130: localtime (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x10037C923: gettimeofday (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1000017D0: main (in ./a)
==66355==
==66355== 112 bytes in 1 blocks are possibly lost in loss record 44 of 75
==66355== at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x10058A3CC: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x100584A6B: notify_register_check (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x1003BC9ED: notify_register_tz (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1003BC35F: tzsetwall_basic (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1003BE130: localtime (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x10037C923: gettimeofday (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1000017D0: main (in ./a)
==66355==
==66355== 396 bytes in 99 blocks are definitely lost in loss record 58 of 75
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001807: main (in ./a)
==66355==
==66355== 396 bytes in 99 blocks are definitely lost in loss record 59 of 75
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x10000195E: main (in ./a)
==66355==
==66355== 800 bytes in 1 blocks are definitely lost in loss record 63 of 75
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x1000016CF: main (in ./a)
==66355==
==66355== 800 bytes in 1 blocks are definitely lost in loss record 64 of 75
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001937: main (in ./a)
==66355==
==66355== LEAK SUMMARY:
==66355== definitely lost: 2,392 bytes in 200 blocks
==66355== indirectly lost: 0 bytes in 0 blocks
==66355== possibly lost: 232 bytes in 4 blocks
==66355== still reachable: 261,764 bytes in 127 blocks
==66355== of which reachable via heuristic:
==66355== newarray : 56 bytes in 1 blocks
==66355== suppressed: 18,074 bytes in 163 blocks
==66355== Reachable blocks (those to which a pointer was found) are not shown.
==66355== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==66355==
==66355== For lists of detected and suppressed errors, rerun with: -s
==66355== ERROR SUMMARY: 1542409 errors from 11 contexts (suppressed: 13 from 13)
[2] 66355 segmentation fault valgrind --leak-check=yes ./a 100 100
Valgrind 输出告诉您,您的 Variant_2()
函数超出了分配内存的范围。这样做是因为这些边界计算是错误的:
int row_start = tid * partition * n;
int row_end = ((tid + 1) * partition) * n;
此时,partition
变量包含分区中计算的行数,n
是总行数(和列数),tid
是整数线程索引。要在连续块中的线程之间划分行,每个线程应从行 tid * partition
开始,并在下一个线程的第一行 (tid + 1) * partition
之前结束。 n
的附加因子不适用于 row-number 计算。
在一维工作数组中计算元素偏移量时,您确实需要 n
因子,但您已经提供了它:
z[i * n + j] += x[i * n + k] * y[k * n + j];
另请注意,您对行进行分区的方法仅在线程数平均分配行数时才有效。否则,最后一行或多行将不会分配给任何线程。
我正在尝试编写两个程序来计算两个 nxn 稠密矩阵乘积的范数。第一部分,变体 1 按预期工作。它将两个矩阵并行相乘,然后计算范数。变体 2 未按预期工作。在这里,我尝试水平划分一个矩阵并将其与另一个矩阵相乘。乘法有效,但无论出于何种原因,我在计算范数时不断收到以下错误:
a(72157,0x110d5fdc0) malloc: Incorrect checksum for freed object 0x7f8b035060e8: probably modified after being freed.
Corrupt value: 0x40ab000000000000
a(72157,0x110d5fdc0) malloc: *** set a breakpoint in malloc_error_break to debug
或
[2] 73373 segmentation fault ./a 100 100
我认为这可能是没有正确释放内存的情况,但我释放了线程正在使用的内存,然后再次分配了内存,但对于看起来至少大于 5 的维度,我一直收到该错误.当我输入两个命令行参数是矩阵大小和线程数的较小矩阵时,我得到以下信息:
gcc -o a a.c -pthread && ./a 2 2
********** Variant 1 **********
4.00 4.00
4.00 4.00
Norm : 8.00 Time Elapsed for Variant 1: 0.00
******************************
********** Variant 2 **********
4.00 4.00
4.00 4.00
Norm : 8.00 %
********** Variant 1 **********
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
Norm : 32.00 Time Elapsed for Variant 1: 0.00
******************************
********** Variant 2 **********
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
8.00 8.00 8.00 8.00
Norm : 32.00 %
我真的不知道是什么导致了这个问题。这可能是因为我正在重用代码来计算范数,但这对我来说没有意义,因为它只是一个函数。
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
// Declaring : Matrix Dimension, Number Of Threads, Norm, Matrices, Mutex
int n, num_threads;
double norm;
double * x, * y, * z;
pthread_mutex_t lock;
void * computeNorm(void *arg){
pthread_mutex_lock(&lock);
int tid = *(int*)(arg);
double sum = 0.0;
for(int j = 0; j < n; j++){
sum += z[j * n + tid];
}
if(norm < sum){
norm = sum;
}
pthread_mutex_unlock(&lock);
pthread_exit(NULL);
}
void * Parallel_Matrix_Multiplication(void *arg){
pthread_mutex_lock(&lock);
int tid = *(int *)(arg);
int partition = n / num_threads;
int row_start = tid * partition;
int row_end = (tid+1) * partition;
for (int i = row_start; i < row_end; ++i){
for (int j = 0; j < n; ++j){
for (int k = 0; k < n; ++k) {
z[i * n + j] += x[ i * n + k ] * y[ k * n + j ];
}
}
}
pthread_mutex_unlock(&lock);
pthread_exit(NULL);
}
void * Variant_2(void *arg){
pthread_mutex_lock(&lock);
int tid = *(int*)(arg);
int partition = n / num_threads;
int row_start = tid * partition * n;
int row_end = ((tid + 1) * partition) * n;
for(int i = row_start; i < row_end; i++){
for(int j = 0; j < n; j++){
for(int k = 0; k < n; k++){
z[i * n + j] += x[i * n + k] * y[k * n + j];
}
}
}
pthread_mutex_unlock(&lock);
pthread_exit(NULL);
}
int main(int argc, char ** argv){
int i;
pthread_t * threads;
pthread_t * norm_threads;
pthread_mutex_init(&lock, NULL);
n = atoi( argv[1] );
num_threads = atoi( argv[2] );
threads = (pthread_t *)malloc(num_threads * sizeof(pthread_t));
x = malloc(n * n * sizeof(double));
y = malloc(n * n * sizeof(double));
z = malloc(n * n * sizeof(double));
for(int i = 0; i < n * n; i++){
x[i] = 1.0;
y[i] = 2.0;
z[i] = 0.0;
}
printf(" ********** Variant 1 ********** \n");
for ( i = 0; i < num_threads; ++i ) {
int *tid;
tid = (int *) malloc( sizeof(int) );
*tid = i;
pthread_create( &threads[i], NULL, Parallel_Matrix_Multiplication, (void *)tid );
}
for ( i = 0; i < num_threads; ++i ) {
pthread_join( threads[i], NULL );
}
for(int i = 0; i < n; i++){
for(int j = 0; j < n; j++){
printf(" %0.2f ", z[i * n + j]);
}
printf("\n");
}
norm_threads = (pthread_t*)malloc(num_threads * sizeof(pthread_t));
for(int i = 0; i < num_threads; i++){
int *tid;
tid = (int*)malloc(sizeof(int));
*tid = i;
pthread_create(&norm_threads[i], NULL, computeNorm, (void*)tid);
}
for(int i = 0; i < num_threads; i++){
pthread_join(norm_threads[i], NULL);
}
printf(" Norm : %0.2f ", norm);
printf("\n ******************************");
norm = 0.0;
for(int i = 0; i < n * n; i++){ z[i] = 0.0; }
free(threads);
free(norm_threads);
threads= (pthread_t*)malloc(num_threads * sizeof(pthread_t));
norm_threads = (pthread_t*)malloc(num_threads * sizeof(pthread_t));
printf("\n ********** Variant 2 ********** \n");
for(i = 0; i < num_threads; i++){
int *tid;
tid = (int*)malloc(sizeof(int));
*tid = i;
pthread_create(&threads[i], NULL, Variant_2, (void*)tid);
}
for(i =0; i < num_threads; i++){
pthread_join(threads[i], NULL);
}
for(i = 0; i < n; i++){
for(int j = 0; j < n; j++){
printf(" %0.2f ", z[i * n + j]);
}
printf("\n");
}
for(i = 0; i < num_threads; i++){
int *tid;
tid = (int*)malloc(sizeof(int));
*tid = i;
pthread_create(&norm_threads[i], NULL, computeNorm, (void*)tid);
}
for(i = 0; i < num_threads; i++){
pthread_join(norm_threads[i], NULL);
}
printf("Norm : %0.2f ", norm);
pthread_mutex_destroy(&lock);
return 0;
}
VALGRIND 输出 - GCC
==66355== Thread 3:
==66355== Invalid read of size 8
==66355== at 0x1000015D7: Variant_2 (in ./a)
==66355== by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== Address 0x100882ac0 is 0 bytes after a block of size 80,000 alloc'd
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001B4C: main (in ./a)
==66355==
==66355== Invalid read of size 8
==66355== at 0x100001600: Variant_2 (in ./a)
==66355== by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== Address 0x10084a350 is 0 bytes after a block of size 80,000 alloc'd
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001738: main (in ./a)
==66355==
==66355== Invalid write of size 8
==66355== at 0x100001605: Variant_2 (in ./a)
==66355== by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== Address 0x10084a350 is 0 bytes after a block of size 80,000 alloc'd
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001738: main (in ./a)
==66355==
==66355==
==66355== Process terminating with default action of signal 11 (SIGSEGV)
==66355== Access not within mapped region at address 0x100DC6540
==66355== at 0x1000015D7: Variant_2 (in ./a)
==66355== by 0x100612108: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== by 0x10060DB8A: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==66355== If you believe this happened as a result of a stack
==66355== overflow in your program's main thread (unlikely but
==66355== possible), you can try to increase the size of the
==66355== main thread stack using the --main-stacksize= flag.
==66355== The main thread stack size used in this run was 8388608.
--66355:0:schedule VG_(sema_down): read returned -4
--66355:0:schedule VG_(sema_down): read returned -4
==66355==
==66355== HEAP SUMMARY:
==66355== in use at exit: 282,462 bytes in 494 blocks
==66355== total heap usage: 538 allocs, 44 frees, 552,158 bytes allocated
==66355==
==66355== Thread 1:
==66355== 32 bytes in 1 blocks are possibly lost in loss record 20 of 75
==66355== at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x1006E65F3: objc::DenseMap<objc_class*, objc_class*, objc::DenseMapValueInfo<objc_class*>, objc::DenseMapInfo<objc_class*>, objc::detail::DenseMapPair<objc_class*, objc_class*> >::grow(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006E64BA: addRemappedClass(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006D6C43: allocateBuckets(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006D6398: lookUpImpOrForward (in /usr/lib/libobjc.A.dylib)
==66355== by 0x100672F99: _xpc_payload_alloc (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100672E65: _xpc_payload_create_from_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100672D22: xpc_receive_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355== by 0x10068CB8A: _xpc_pipe_routine (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100671B61: xpc_pipe_routine_with_flags (in /usr/lib/system/libxpc.dylib)
==66355== by 0x1006719E1: _xpc_interface_routine (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100673CE7: bootstrap_look_up3 (in /usr/lib/system/libxpc.dylib)
==66355==
==66355== 32 bytes in 1 blocks are possibly lost in loss record 21 of 75
==66355== at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x1006E65F3: objc::DenseMap<objc_class*, objc_class*, objc::DenseMapValueInfo<objc_class*>, objc::DenseMapInfo<objc_class*>, objc::detail::DenseMapPair<objc_class*, objc_class*> >::grow(unsigned int) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006E64BA: addRemappedClass(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006E5FF3: realizeClassWithoutSwift(objc_class*, objc_class*) (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006D6D44: -[NSObject dealloc] (in /usr/lib/libobjc.A.dylib)
==66355== by 0x1006D6398: lookUpImpOrForward (in /usr/lib/libobjc.A.dylib)
==66355== by 0x100672F99: _xpc_payload_alloc (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100672E65: _xpc_payload_create_from_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100672D22: xpc_receive_mach_msg (in /usr/lib/system/libxpc.dylib)
==66355== by 0x10068CB8A: _xpc_pipe_routine (in /usr/lib/system/libxpc.dylib)
==66355== by 0x100671B61: xpc_pipe_routine_with_flags (in /usr/lib/system/libxpc.dylib)
==66355== by 0x1006719E1: _xpc_interface_routine (in /usr/lib/system/libxpc.dylib)
==66355==
==66355== 56 bytes in 1 blocks are possibly lost in loss record 26 of 75
==66355== at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x10058A190: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x10058A3A0: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x100584A6B: notify_register_check (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x1003BC9ED: notify_register_tz (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1003BC35F: tzsetwall_basic (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1003BE130: localtime (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x10037C923: gettimeofday (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1000017D0: main (in ./a)
==66355==
==66355== 112 bytes in 1 blocks are possibly lost in loss record 44 of 75
==66355== at 0x100111C90: calloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x10058A3CC: _notify_fork_child (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x100584A6B: notify_register_check (in /usr/lib/system/libsystem_notify.dylib)
==66355== by 0x1003BC9ED: notify_register_tz (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1003BC35F: tzsetwall_basic (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1003BE130: localtime (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x10037C923: gettimeofday (in /usr/lib/system/libsystem_c.dylib)
==66355== by 0x1000017D0: main (in ./a)
==66355==
==66355== 396 bytes in 99 blocks are definitely lost in loss record 58 of 75
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001807: main (in ./a)
==66355==
==66355== 396 bytes in 99 blocks are definitely lost in loss record 59 of 75
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x10000195E: main (in ./a)
==66355==
==66355== 800 bytes in 1 blocks are definitely lost in loss record 63 of 75
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x1000016CF: main (in ./a)
==66355==
==66355== 800 bytes in 1 blocks are definitely lost in loss record 64 of 75
==66355== at 0x100111635: malloc (in /usr/local/Cellar/valgrind/HEAD-6049595/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==66355== by 0x100001937: main (in ./a)
==66355==
==66355== LEAK SUMMARY:
==66355== definitely lost: 2,392 bytes in 200 blocks
==66355== indirectly lost: 0 bytes in 0 blocks
==66355== possibly lost: 232 bytes in 4 blocks
==66355== still reachable: 261,764 bytes in 127 blocks
==66355== of which reachable via heuristic:
==66355== newarray : 56 bytes in 1 blocks
==66355== suppressed: 18,074 bytes in 163 blocks
==66355== Reachable blocks (those to which a pointer was found) are not shown.
==66355== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==66355==
==66355== For lists of detected and suppressed errors, rerun with: -s
==66355== ERROR SUMMARY: 1542409 errors from 11 contexts (suppressed: 13 from 13)
[2] 66355 segmentation fault valgrind --leak-check=yes ./a 100 100
Valgrind 输出告诉您,您的 Variant_2()
函数超出了分配内存的范围。这样做是因为这些边界计算是错误的:
int row_start = tid * partition * n; int row_end = ((tid + 1) * partition) * n;
此时,partition
变量包含分区中计算的行数,n
是总行数(和列数),tid
是整数线程索引。要在连续块中的线程之间划分行,每个线程应从行 tid * partition
开始,并在下一个线程的第一行 (tid + 1) * partition
之前结束。 n
的附加因子不适用于 row-number 计算。
在一维工作数组中计算元素偏移量时,您确实需要 n
因子,但您已经提供了它:
z[i * n + j] += x[i * n + k] * y[k * n + j];
另请注意,您对行进行分区的方法仅在线程数平均分配行数时才有效。否则,最后一行或多行将不会分配给任何线程。