寻找多核核心包上竞争条件的原因
Looking the cause of a race condition on a multicore corepack
我正在使用基于写入索引和读取索引的简单软件队列。
详细介绍;语言:C,编译器:GCC 优化:-O3 带额外参数,架构:Armv7a,CPU:多核,2 个 Cortex A-15,L2 缓存:共享和启用,L1 缓存:每个 CPU,启用,架构应该是缓存一致的。
CPU 1 负责写作,CPU 2 负责阅读。下面是非常简化的示例代码。您可以假设索引的初始值为零。
常见:
#define QUE_LEN 4
unsigned int my_que_write_index = 0; //memory
unsigned int my_que_read_index = 0; //memory
struct my_que_struct{
unsigned int param1;
unsigned int param2;
};
struct my_que_struct my_que[QUE_LEN]; //memory
CPU 1 次运行:
void que_writer
{
unsigned int write_index_local;
write_index_local = my_que_write_index; //my_que_write_index is in memory
my_que[write_index_local].param1 = 16; //my_que is my queue and stored in memory also
my_que[write_index_local].param2 = 32;
//similar writing stuff
++write_index_local;
if(write_index_local == QUE_LEN) write_index_local = 0;
my_que_write_index = write_index_local;
}
CPU 2 次运行:
void que_reader()
{
unsigned int read_index_local, param1, param2;
read_index_local = my_que_read_index; //also in memory
while(read_index_local != my_que_write_index)
{
param1 = my_que[read_index_local].param1;
if(param1 == 0) FATAL_ERROR;
param2 = my_que[read_index_local].param2;
//similar reading stuff
my_que[read_index_local].param1 = 0;
++read_index_local;
if(read_index_local == QUE_LEN) read_index_local = 0;
}
my_que_read_index = read_index_local;
}
好的,在正常情况下,致命错误永远不会发生,因为队列的 param1 始终以常量值 16 存储。但是不知何故,队列的 param1 发生了 0 并且发生了致命错误。
很明显,这在某种程度上是一个竞争条件问题,但我不知道它是怎么发生的。索引由 CPU 单独更新。
我不想在不了解问题核心的情况下用内存障碍填充我的代码。你知道这是怎么回事吗?
详情:这是一个裸机系统,这些代码是禁止中断的,没有抢占或任务切换。
编译器和 CPU 可以根据需要重新安排存储和加载(即只要单线程程序无法观察到差异)。当然,对于多线程程序,这些效果是可以很好地观察到的。
比如这段代码
write_index_local = my_que_write_index;
my_que[write_index_local].param1 = 16;
my_que[write_index_local].param2 = 32;
++write_index_local;
if(write_index_local == QUE_LEN) write_index_local = 0;
my_que_write_index = write_index_local;
可以这样重新排序
a = my_que_write_index;
my_que_write_index = write_index_local == QUE_LEN - 1 ? 0 : a + 1;
my_que[a].param1 = 16;
my_que[a].param2 = 32;
正确处理这些内容需要避免此类重新排序的原子和障碍。查看 Preshing 的一系列优秀博客文章以了解原子学,这篇文章可能是一个好的开始:http://preshing.com/20120612/an-introduction-to-lock-free-programming/ 但也请查看以下文章。
我正在使用基于写入索引和读取索引的简单软件队列。
详细介绍;语言:C,编译器:GCC 优化:-O3 带额外参数,架构:Armv7a,CPU:多核,2 个 Cortex A-15,L2 缓存:共享和启用,L1 缓存:每个 CPU,启用,架构应该是缓存一致的。
CPU 1 负责写作,CPU 2 负责阅读。下面是非常简化的示例代码。您可以假设索引的初始值为零。
常见:
#define QUE_LEN 4
unsigned int my_que_write_index = 0; //memory
unsigned int my_que_read_index = 0; //memory
struct my_que_struct{
unsigned int param1;
unsigned int param2;
};
struct my_que_struct my_que[QUE_LEN]; //memory
CPU 1 次运行:
void que_writer
{
unsigned int write_index_local;
write_index_local = my_que_write_index; //my_que_write_index is in memory
my_que[write_index_local].param1 = 16; //my_que is my queue and stored in memory also
my_que[write_index_local].param2 = 32;
//similar writing stuff
++write_index_local;
if(write_index_local == QUE_LEN) write_index_local = 0;
my_que_write_index = write_index_local;
}
CPU 2 次运行:
void que_reader()
{
unsigned int read_index_local, param1, param2;
read_index_local = my_que_read_index; //also in memory
while(read_index_local != my_que_write_index)
{
param1 = my_que[read_index_local].param1;
if(param1 == 0) FATAL_ERROR;
param2 = my_que[read_index_local].param2;
//similar reading stuff
my_que[read_index_local].param1 = 0;
++read_index_local;
if(read_index_local == QUE_LEN) read_index_local = 0;
}
my_que_read_index = read_index_local;
}
好的,在正常情况下,致命错误永远不会发生,因为队列的 param1 始终以常量值 16 存储。但是不知何故,队列的 param1 发生了 0 并且发生了致命错误。
很明显,这在某种程度上是一个竞争条件问题,但我不知道它是怎么发生的。索引由 CPU 单独更新。
我不想在不了解问题核心的情况下用内存障碍填充我的代码。你知道这是怎么回事吗?
详情:这是一个裸机系统,这些代码是禁止中断的,没有抢占或任务切换。
编译器和 CPU 可以根据需要重新安排存储和加载(即只要单线程程序无法观察到差异)。当然,对于多线程程序,这些效果是可以很好地观察到的。
比如这段代码
write_index_local = my_que_write_index;
my_que[write_index_local].param1 = 16;
my_que[write_index_local].param2 = 32;
++write_index_local;
if(write_index_local == QUE_LEN) write_index_local = 0;
my_que_write_index = write_index_local;
可以这样重新排序
a = my_que_write_index;
my_que_write_index = write_index_local == QUE_LEN - 1 ? 0 : a + 1;
my_que[a].param1 = 16;
my_que[a].param2 = 32;
正确处理这些内容需要避免此类重新排序的原子和障碍。查看 Preshing 的一系列优秀博客文章以了解原子学,这篇文章可能是一个好的开始:http://preshing.com/20120612/an-introduction-to-lock-free-programming/ 但也请查看以下文章。