锁定 OpenMP
locks in OpenMP
大家好!
不久前,我能够并行使用递归算法来搜索组合某些事件的可能选项。目前代码如下:
//#include's
// function announcements
// declaring a global variable:
QVector<QVector<QVector<float>>> variant; (or "std::vector")
int main() {
// reads data from file
// data are converted and analyzed
// the variant variable containing the current best result is filled in (here - by pre-analysis)
#pragma omp parallel shared(variant)
#pragma omp master
// occurs call a recursive algorithm of search all variants:
PEREBOR(Tabl_1, a, i_a, ..., reс_depth);
return 0;
}
void PEREBOR(QVector<QVector<uint8_t>> Tabl_1, QVector<A_struct> a, uint8_t i_a, ..., uint8_t reс_depth)
{
// looking for the boundaries of the first cycle for some reasons
for (int i = quantity; i < another_quantity; i++) {
// the Tabl_1 is processed and modified to determine the number of steps in the subsequent for cycle
for (int k = 0; k < the_quantity_just_found; k++) {
if the recursion depth is not 1, we go down further: {
// add descent to the next recursion level to the call stack:
#pragma omp task
PEREBOR(Tabl_1_COPY, a, i_a, ..., reс_depth-1);
}
else (if we went down to the lowest level): {
if (condition fulfilled) // condition check - READ variant variable
variant = it_is_equal_to_that_,_to_that...;
else
continue;
}
}
}
}
目前,这东西真的很好用,在六核上 CPU 比单核版本提高了 5.7 以上。
如您所见,如果线程数量足够多,则可能会出现与 variant 变量的同步 reading/writing 相关联的故障。我明白她需要被保护。目前,我只在使用阻塞函数时看到输出,因为临界区不合适,因为如果变量 variant 只写在代码的一个部分(在递归的最低级别),那么读取就会发生在许多地方。
实际上,这里是问题 - 如果我应用结构:
omp_lock_t lock;
int main() {
...
omp_init_lock(&lock);
#pragma omp parallel shared(variant, lock)
...
}
...
else (if we went down to the lowest level): {
if (condition fulfilled) { // condition check - READ variant variable
omp_set_lock(&lock);
variant = it_is_equal_to_that_,_to_that...;
omp_unset_lock(&lock);
}
else
continue;
...
这个锁会保护所有其他地方对变量的读取吗?还是我需要手动检查锁定状态并在其他地方阅读之前暂停线程?
我将非常感谢杰出社区的帮助!
在 OpenMP 规范 (1.4.1 The structure of OpenMP memory model) 中,您可以阅读
The OpenMP API provides a relaxed-consistency, shared-memory model.
All OpenMP threads have access to a place to store and to retrieve
variables, called the memory. In addition, each thread is allowed to
have its own temporary view of the memory. The temporary view of
memory for each thread is not a required part of the OpenMP memory
model, but can represent any kind of intervening structure, such as
machine registers, cache, or other local storage, between the thread
and the memory. The temporary view of memory allows the thread to
cache variables and thereby to avoid going to memory for every
reference to a variable.
这实际上意味着(与任何宽松的内存模型一样),只有在明确定义的点上,线程才能保证对共享变量的值具有相同、一致的看法。在这些点之间,跨线程的临时视图可能不同。
在您的代码中,您处理了同时写入同一个变量的问题,但不能保证另一个线程在没有额外措施的情况下读取正确的变量值。
您有 3 个选项(请注意,每个解决方案不仅会同时处理 read/writes,还会提供对共享变量值的一致视图。):
- 如果你的变量是标量类型,最好的解决办法是使用atomic operations。这是最快的选项,因为硬件通常支持原子操作。
#pragma omp parallel
{
...
#pragma omp atomic read
tmp=variant;
....
#pragma omp atomic write
variant=new_value;
}
- 使用critical construct。如果您的变量是复杂类型(例如 class)并且其 read/write 无法自动执行,则可以使用此解决方案。请注意,它比原子操作效率低得多(慢)。
#pragma omp parallel
{
...
#pragma omp critical
tmp=variant;
....
#pragma omp critical
variant=new_value;
}
- 对每个 read/write 变量使用 locks。您的代码可以写入,但也必须用于读取。它需要最多的编码,但实际上结果与使用关键结构相同。请注意,OpenMP 实现通常使用锁来实现关键结构。
大家好! 不久前,我能够并行使用递归算法来搜索组合某些事件的可能选项。目前代码如下:
//#include's
// function announcements
// declaring a global variable:
QVector<QVector<QVector<float>>> variant; (or "std::vector")
int main() {
// reads data from file
// data are converted and analyzed
// the variant variable containing the current best result is filled in (here - by pre-analysis)
#pragma omp parallel shared(variant)
#pragma omp master
// occurs call a recursive algorithm of search all variants:
PEREBOR(Tabl_1, a, i_a, ..., reс_depth);
return 0;
}
void PEREBOR(QVector<QVector<uint8_t>> Tabl_1, QVector<A_struct> a, uint8_t i_a, ..., uint8_t reс_depth)
{
// looking for the boundaries of the first cycle for some reasons
for (int i = quantity; i < another_quantity; i++) {
// the Tabl_1 is processed and modified to determine the number of steps in the subsequent for cycle
for (int k = 0; k < the_quantity_just_found; k++) {
if the recursion depth is not 1, we go down further: {
// add descent to the next recursion level to the call stack:
#pragma omp task
PEREBOR(Tabl_1_COPY, a, i_a, ..., reс_depth-1);
}
else (if we went down to the lowest level): {
if (condition fulfilled) // condition check - READ variant variable
variant = it_is_equal_to_that_,_to_that...;
else
continue;
}
}
}
}
目前,这东西真的很好用,在六核上 CPU 比单核版本提高了 5.7 以上。 如您所见,如果线程数量足够多,则可能会出现与 variant 变量的同步 reading/writing 相关联的故障。我明白她需要被保护。目前,我只在使用阻塞函数时看到输出,因为临界区不合适,因为如果变量 variant 只写在代码的一个部分(在递归的最低级别),那么读取就会发生在许多地方。 实际上,这里是问题 - 如果我应用结构:
omp_lock_t lock;
int main() {
...
omp_init_lock(&lock);
#pragma omp parallel shared(variant, lock)
...
}
...
else (if we went down to the lowest level): {
if (condition fulfilled) { // condition check - READ variant variable
omp_set_lock(&lock);
variant = it_is_equal_to_that_,_to_that...;
omp_unset_lock(&lock);
}
else
continue;
...
这个锁会保护所有其他地方对变量的读取吗?还是我需要手动检查锁定状态并在其他地方阅读之前暂停线程? 我将非常感谢杰出社区的帮助!
在 OpenMP 规范 (1.4.1 The structure of OpenMP memory model) 中,您可以阅读
The OpenMP API provides a relaxed-consistency, shared-memory model. All OpenMP threads have access to a place to store and to retrieve variables, called the memory. In addition, each thread is allowed to have its own temporary view of the memory. The temporary view of memory for each thread is not a required part of the OpenMP memory model, but can represent any kind of intervening structure, such as machine registers, cache, or other local storage, between the thread and the memory. The temporary view of memory allows the thread to cache variables and thereby to avoid going to memory for every reference to a variable.
这实际上意味着(与任何宽松的内存模型一样),只有在明确定义的点上,线程才能保证对共享变量的值具有相同、一致的看法。在这些点之间,跨线程的临时视图可能不同。
在您的代码中,您处理了同时写入同一个变量的问题,但不能保证另一个线程在没有额外措施的情况下读取正确的变量值。
您有 3 个选项(请注意,每个解决方案不仅会同时处理 read/writes,还会提供对共享变量值的一致视图。):
- 如果你的变量是标量类型,最好的解决办法是使用atomic operations。这是最快的选项,因为硬件通常支持原子操作。
#pragma omp parallel
{
...
#pragma omp atomic read
tmp=variant;
....
#pragma omp atomic write
variant=new_value;
}
- 使用critical construct。如果您的变量是复杂类型(例如 class)并且其 read/write 无法自动执行,则可以使用此解决方案。请注意,它比原子操作效率低得多(慢)。
#pragma omp parallel
{
...
#pragma omp critical
tmp=variant;
....
#pragma omp critical
variant=new_value;
}
- 对每个 read/write 变量使用 locks。您的代码可以写入,但也必须用于读取。它需要最多的编码,但实际上结果与使用关键结构相同。请注意,OpenMP 实现通常使用锁来实现关键结构。