从线程内增加全局计数器变量,而不必等待每个单独的线程

Increasing a global counter variable from within a thread without having to wait for each individual thread

我的目标是创建一个程序来评估增加程序可以使用的线程数带来的性能提升。我通过使用 Monte Carlo 方法计算 pi 来评估性能。每个线程应创建 1 个随机坐标 (x,y) 并检查该坐标是否在圆内。如果是,inCircle 计数器应该增加。 Pi 计算如下:4 * inCircle/trys。使用 pthread_join,应该受益于多线程的问题没有性能提升。有没有什么方法可以让多个线程增加一个计数器而不必等待每个单独的线程?

#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <stdbool.h>
#include <pthread.h>

#define nPoints 10000000
#define NUM_THREADS 16

int inCircle = 0;
int count = 0;
double x,y;
pthread_mutex_t mutex;

bool isInCircle(double x, double y){
    if(x*x+y*y<=1){
        return true;
    }
    else{
        return false;
    }
}

void *piSlave(){
    int myCount = 0;
    time_t now;
    time(&now);
    srand((unsigned int)now);
    for(int i = 1; i <= nPoints/NUM_THREADS; i++) {
        x = (double)rand() / (double)RAND_MAX;
        y = (double)rand() / (double)RAND_MAX;
        if(isInCircle(x,y)){
            myCount++;
        }
     }
    pthread_mutex_lock(&mutex);
    inCircle += myCount;
    pthread_mutex_unlock(&mutex);
    pthread_exit(0);
}
double piMaster()
{
    pthread_t threads[NUM_THREADS];
    int rc;
    long t;

    for(t=0; t<NUM_THREADS; t++){
        printf("Creating thread %ld\n", t);
        rc = pthread_create(&threads[t], NULL, piSlave, (void *)t);
        if (rc){
            printf("ERROR; return code from pthread_create() is %d\n", rc);
            exit(-1);
        }
    //pthread_join(threads[t], NULL);

    }
    //wait(NULL);
    return 4.0*inCircle/nPoints;
}

int main()
{
    printf("%f\n",piMaster());
    return(0);
}

代码几乎没有问题。

等待线程终止

piMaster() 函数应该等待它创建的线程。我们可以通过简单地 运行 pthread_join() 循环来做到这一点:

for (t = 0; t < NUM_THREADS; t++)
    pthread_join(threads[t], NULL);

避免锁定

我们可以简单地在循环结束时自动增加 inCircle 计数器,因此不需要锁。必须使用 _Atomic 关键字声明变量,如 Atomic operations C reference:

中所述
_Atomic long inCircle = 0;
void *piSlave(void *arg)
{
    [...]
    inCircle += myCount;
    [...]
}

这将生成正确的 CPU 指令以自动增加变量。例如,对于 x86 架构,我们可以在反汇编中确认出现 lock 前缀:

29      inCircle += myCount;
   0x0000000100000bdb <+155>:   lock add %rbx,0x46d(%rip)        # 0x100001050 <inCircle>

避免速度慢和线程不安全rand()

相反,我们可以按照 Approximations of Pi 维基百科页面所述简单地循环扫描整个圆圈:

for (long x = -RADIUS; x <= RADIUS; x++)
    for (long y = -RADIUS; y <= RADIUS; y++)
        myCount += isInCircle(x, y);

所以这里是上面更改后的代码:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define RADIUS 10000L
#define NUM_THREADS 10

_Atomic long inCircle = 0;

inline long isInCircle(long x, long y)
{
    return x * x + y * y <= RADIUS * RADIUS ? 1 : 0;
}

void *piSlave(void *arg)
{
    long myCount = 0;
    long tid = (long)arg;

    for (long x = -RADIUS + tid; x <= RADIUS + tid; x += NUM_THREADS)
        for (long y = -RADIUS; y <= RADIUS; y++)
            myCount += isInCircle(x, y);

    printf("\tthread %ld count: %zd\n", tid, myCount);
    inCircle += myCount;

    pthread_exit(0);
}

double piMaster()
{
    pthread_t threads[NUM_THREADS];
    long t;

    for (t = 0; t < NUM_THREADS; t++) {
        printf("Creating thread %ld...\n", t);
        if (pthread_create(&threads[t], NULL, piSlave, (void *)t)) {
            perror("Error creating pthread");
            exit(-1);
        }
    }
    for (t = 0; t < NUM_THREADS; t++)
        pthread_join(threads[t], NULL);

    return (double)inCircle / (RADIUS * RADIUS);
}

int main()
{
    printf("Result: %f\n", piMaster());
    return (0);
}

这是输出:

Creating thread 0...
Creating thread 1...
Creating thread 2...
Creating thread 3...
Creating thread 4...
Creating thread 5...
Creating thread 6...
Creating thread 7...
Creating thread 8...
Creating thread 9...
    thread 7 count: 31415974
    thread 5 count: 31416052
    thread 1 count: 31415808
    thread 3 count: 31415974
    thread 0 count: 31415549
    thread 4 count: 31416048
    thread 2 count: 31415896
    thread 9 count: 31415808
    thread 8 count: 31415896
    thread 6 count: 31416048
Result: 3.141591