为什么我的子进程没有给我 "correct" 结果？

Question

我的代码的目的是执行两个子进程并递增一个共享变量计数器。每个进程都应将其递增 100 万。这是我的代码：

    #include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>

typedef struct
{
    int value;
}   shared_mem; 
shared_mem  *counter;

//these next two processes are the ones that increment the counter
process1()
{
    int i=0;
    for(i=0;i<1000000;i++)
        counter->value++;
}

process2()
{
    int i=0;
    for(i=0;i<1000000;i++)
        counter->value++;
}

/*  The Main Body   */

main()
{
    key_t   key = IPC_PRIVATE;  /* shared memory key */ 
    int shmid;  /* shared memory ID */ 
    shared_mem  *shmat1;    

    int pid1;   /* process id for child1 */
    int pid2;   /* process id for child2 */


    /* attempts to attach to an existing memory segment */

    if (( shmid = shmget(key, sizeof(int), IPC_CREAT | 0666)) < 0)
    {
        perror("shmget");
         return(1);
    }

    /*attempts the shared memory segment    */

    if((counter = (shared_mem *)shmat(shmid, NULL, 0)) == (shared_mem *) -1)
    {
        perror("shmat");    
        return(1);
    }

    /*initializing shared memory to 0 */ 
    counter->value = 0;

    pid1=fork();
    /* fork process one here */
    if(pid1==0)
    {
        printf("I am child 1 with PID %d\n", getpid());
        process1();
    }
    else
    {
        pid2=fork();

        if(pid2==0)
        {
            printf("I am child 2 with PID %d\n", getpid());
            process2();
        }
        else
        {
            wait(NULL);
            printf("I am parent with PID %d\n", getpid());
            printf("Total counter value is: %d\n", counter->value);
        }

    }


    /*deallocate shared memory */
    if(shmctl(shmid, IPC_RMID, (struct shmid_ds *)0)== -1)
    { 
        perror("shmctl");
        return(-1);
    }
    return(0);

}

计数器输出在100万左右徘徊，但应该不会在200万左右徘徊吧？我想我不了解进程递增的方式。非常感谢，如果代码太长，我深表歉意，但我不确定我可以包含哪些内容以及可以排除哪些内容。

Answer 1

变量自增不是原子的；除非另有说明，否则编译器可以生成如下代码：

load counter->value in a register
increment the register
move the incremented value back to counter->value

这正是 gcc 在禁用优化后生成的代码类型：

mov rax, QWORD PTR counter[rip]   ; find out the address of counter->value
mov edx, DWORD PTR [rax]          ; get its content in edx
add edx, 1                        ; increment edx
mov DWORD PTR [rax], edx          ; move it back to counter->value

（虽然您可能有竞争条件，即使为增量生成的程序集只是一条指令 - 例如，即使是 x86 上的直接 inc DWORD PTR[rax] is not atomic 在多核上机器，除非它有 lock 前缀。）

现在，如果您有两个线程不断尝试并发地递增变量，那么通常您会有一系列类似于此的操作：

Thread A                               Thread B
load counter->value in a register
                                       load counter->value in a register
                                       increment the register
increment the register
move the register to counter->value
                                       move the register to counter->value

因为两个递增都发生在一个单独的寄存器中，从相同的值开始，最终结果是 counter->value 看起来只递增一次，而不是两次（这只是一个可能的例子，你可以设想许多其他可能的序列，它们可以跳过任意数量的增量 - 认为线程 1 在加载和存储之间暂停，而第二个线程继续进行多次迭代）。

解决方案是使用原子操作作用于共享值；在 gcc 上，您有几个 atomic builtins 可用，它们扩展为正确的汇编代码，执行所描述的操作 原子地 ，即没有交错的风险，例如上述的。

在这种特殊情况下，您应该将 counter->value++ 替换为 __sync_add_and_fetch(&counter->value, 1) 之类的内容。生成的代码变为

mov rax, QWORD PTR counter[rip]    ; find out the address of counter->value
lock add    DWORD PTR [rax], 1     ; atomically increment the pointed value

您应该会看到计数器确实达到了预期的 2000000。

但是请注意，原子操作非常有限，因为 CPU 通常仅在有限数量的类型（通常是小于本机字长的整数）上支持此类操作，并且只有少数原语可用或易于组装（例如，原子交换、原子比较和交换、原子增量等）。出于这个原因，每当您需要保证某些任意代码块始终以原子方式执行时，您必须求助于互斥锁和其他同步原语（通常是基于原子操作构建的）。

为什么我的子进程没有给我 "correct" 结果？

Why are my child processes not giving me the "correct" results?

c

unix

fork

increment

parent-child