为什么这个 Python 线程代码有竞争条件?
Why does this Python code with threading have race conditions?
这段代码创建了一个竞争条件:
import threading
ITERS = 100000
x = [0]
def worker():
for _ in range(ITERS):
x[0] += 1 # this line creates a race condition
# because it takes a value, increments and then writes
# some inrcements can be done together, and lost
def main():
x[0] = 0 # you may use `global x` instead of this list trick too
t1 = threading.Thread(target=worker)
t2 = threading.Thread(target=worker)
t1.start()
t2.start()
t1.join()
t2.join()
for i in range(5):
main()
print(f'iteration {i}. expected x = {ITERS*2}, got {x[0]}')
输出:
$ python3 test.py
iteration 0. expected x = 200000, got 200000
iteration 1. expected x = 200000, got 148115
iteration 2. expected x = 200000, got 155071
iteration 3. expected x = 200000, got 200000
iteration 4. expected x = 200000, got 200000
Python3版本:
Python 3.9.7 (default, Sep 10 2021, 14:59:43)
[GCC 11.2.0] on linux
我认为 GIL 会阻止它并且不允许两个线程 运行 在一起,直到它们执行与 io 相关的操作或调用 C 库。至少这是你可以从 the docs.
得出的结论
那么,GIL到底做了什么,线程什么时候运行并行?
阅读 the docs 更好,我认为有答案:
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.
However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.
我不知道内部原理,但猜测这个字节码的每一行或每一块都是单独执行的,其他线程正在等待(这使得它很慢)。但是有些行由多个块组成,不是原子的。
如果 运行 dis.dis('x[0] += 1')
:
0 LOAD_NAME 0 (x)
2 LOAD_CONST 0 (0)
4 DUP_TOP_TWO
6 BINARY_SUBSCR
8 LOAD_CONST 1 (1)
10 INPLACE_ADD
12 ROT_THREE
14 STORE_SUBSCR
16 LOAD_CONST 2 (None)
18 RETURN_VALUE
其中一些以并发方式执行,并产生竞争条件。所以GIL只保证像list
或dict
这样的结构内部不会被破坏。
这段代码创建了一个竞争条件:
import threading
ITERS = 100000
x = [0]
def worker():
for _ in range(ITERS):
x[0] += 1 # this line creates a race condition
# because it takes a value, increments and then writes
# some inrcements can be done together, and lost
def main():
x[0] = 0 # you may use `global x` instead of this list trick too
t1 = threading.Thread(target=worker)
t2 = threading.Thread(target=worker)
t1.start()
t2.start()
t1.join()
t2.join()
for i in range(5):
main()
print(f'iteration {i}. expected x = {ITERS*2}, got {x[0]}')
输出:
$ python3 test.py
iteration 0. expected x = 200000, got 200000
iteration 1. expected x = 200000, got 148115
iteration 2. expected x = 200000, got 155071
iteration 3. expected x = 200000, got 200000
iteration 4. expected x = 200000, got 200000
Python3版本:
Python 3.9.7 (default, Sep 10 2021, 14:59:43)
[GCC 11.2.0] on linux
我认为 GIL 会阻止它并且不允许两个线程 运行 在一起,直到它们执行与 io 相关的操作或调用 C 库。至少这是你可以从 the docs.
得出的结论那么,GIL到底做了什么,线程什么时候运行并行?
阅读 the docs 更好,我认为有答案:
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.
However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.
我不知道内部原理,但猜测这个字节码的每一行或每一块都是单独执行的,其他线程正在等待(这使得它很慢)。但是有些行由多个块组成,不是原子的。
如果 运行 dis.dis('x[0] += 1')
:
0 LOAD_NAME 0 (x)
2 LOAD_CONST 0 (0)
4 DUP_TOP_TWO
6 BINARY_SUBSCR
8 LOAD_CONST 1 (1)
10 INPLACE_ADD
12 ROT_THREE
14 STORE_SUBSCR
16 LOAD_CONST 2 (None)
18 RETURN_VALUE
其中一些以并发方式执行,并产生竞争条件。所以GIL只保证像list
或dict
这样的结构内部不会被破坏。