为什么多线程更慢?

Why is multithreading slower?

实现了使用归并排序的1000000节点的三个链表的多进程多线程实现。 我比较了实现程序的实时性,但是多线程的方式比较慢。 这是为什么?

process.c 中的主要方法

    /* Insert nodes */
    Node* tmp = NULL;   
    int num;    
    for( int i = 0; i < MAX; i++ )
    {
        fscanf(fread,"%d",&num);    
        tmp = createNode(num , i ); 
        insertNode( &list1.head, &list1.tail, tmp );
        tmp = createNode(num , i ); 
        insertNode( &list2.head, &list2.tail, tmp );    
        tmp = createNode(num , i );
        insertNode( &list3.head, &list3.tail, tmp );    
        tmp = createNode(num , i ); 
    }
    free( tmp );    
    fclose(fread);  

    if ((t1 = times(&mytms)) == -1) {
        perror("times 1");
        exit(1);
    }

    pid1= fork();   
    if(pid1==0){
        mergeSort( &list1.head );   
        file_output(&list1);    
        freeAll( list1.head );
        exit(1);    
    }
    pid2= fork();   
    if(pid2==0){
        mergeSort( &list2.head );   
        file_output(&list2);    
        freeAll( list2.head );  
        exit(2);    
    }
    pid3 = fork();
    if(pid3==0){
        mergeSort( &list3.head );   
        file_output(&list3);    
        freeAll( list3.head );  
        exit(3);    
    }

    wait(&status);  
    wait(&status);
    wait(&status);

    if ((t2 = times(&mytms)) == -1) {   
        perror("times 2");
        exit(1);
    }

    printf("Real time : %.5f sec\n", (double)(t2 - t1) / CLK_TCK);
    printf("User time : %.5f sec\n", (double)mytms.tms_utime / CLK_TCK);
    printf("System time : %.5f sec\n", (double)mytms.tms_stime / CLK_TCK);

结果 实时:1.65

主要在thread.c

   /* Insert nodes */
   Node* tmp = NULL;   
   int num;           

   for( int i = 0; i < MAX; i++ )
   {
      fscanf(fread,"%d",&num); 
      tmp = createNode(num , i ); 
      insertNode( &list1.head, &list1.tail, tmp );  
      tmp = createNode(num , i );  
      insertNode( &list2.head, &list2.tail, tmp );  
      tmp = createNode(num , i );  
      insertNode( &list3.head, &list3.tail, tmp );  
   }

   free( tmp );
   fclose(fread);  

   if ((t1 = times(&mytms)) == -1) {
        perror("times 1");
        exit(1);
   }

   pthread_create( &t_id1, NULL, thread_func, &list1 );
   pthread_create( &t_id2, NULL, thread_func, &list2 );
   pthread_create( &t_id3, NULL, thread_func, &list3 );

   pthread_join( t_id1, (void*)&status );
   pthread_join( t_id2, (void*)&status );
   pthread_join( t_id3, (void*)&status );

   if ((t2 = times(&mytms)) == -1) {
        perror("times 2");
      exit(1);
   }

   printf("Real time : %.5f sec\n", (double)(t2 - t1) / CLK_TCK);
   printf("User time : %.5f sec\n", (double)mytms.tms_utime / CLK_TCK);  
   printf("System time : %.5f sec\n", (double)mytms.tms_stime / CLK_TCK);  

结果 实时 2.27

Why is multithreading slower?

它是特定于处理器的,并且与 cores, the organization of CPU caches, their cache coherence, your RAM. See also tests and benchmarks on https://www.phoronix.com/ 的数量相关;它在 Intel Core i7 10700K 和 AMD Ryzen 9 3900X(价格接近)上不会相同。

也是compiler and optimization specific. Read the Dragon book and a good book on Computer Architecture.

这也取决于您的具体情况 operating system and your particular C standard library (e.g. GNU glibc is not the same as musl-libc), and glibc 2.31 could have different performance than glibc 2.30 on the same computer. Read Advanced Linux Programming, pthreads(7), nptl(7), numa(7), time(7), madvise(2), syscalls(2)

你试过最近的 Linux 和最近的 GCC 10 invoked as gcc -Wall -O3 -mtune=native 至少吗?

您可以在 Linux 上使用 proc(5) then hwinfo 来查询您的硬件。

您可能对 OpenCL, OpenMP, or OpenACC, and you should read about optimization options of your particular C compiler. For recent GCC, see this. You could even customize your recent GCC with your GCC plugins to improve optimizations, and you could try a recent Clang or icc 编译器感兴趣。

另请参阅 MILEPOST GCC project and the CTuning one. Read also this draft report. Attend ACM SIGPLAN and SIGOPS 会议。联系您附近的计算机科学学者。

你可能会在理解你问题的答案的同时获得博士学位。