Python 和线程：为什么将计算提取到锁定区域之外可以加快代码速度？

Question

我正在为我的团队准备 session 关于 Python 中的线程化和锁定，但我运行遇到了一个我不完全理解的情况。

在下面的代码中，我使用 Python（20 个池）中的线程池计算许多 (1000) 个大字符串（100k+ 个字符）的哈希值。散列然后存储在字典中 digest，因此我在写入字典时使用锁定（我认为这实际上可能不是必需的 - 但我们假设它是并且我们需要锁定）。

版本 A) 在锁定语句中进行昂贵的散列计算，版本 B) 在获取锁之前进行，然后只在关键部分用结果更新字典。

import threading
import time
from multiprocessing.pool import ThreadPool
import hashlib

# A) computation is within the lock statement
lock = threading.Lock()

digests = {}

def compute_digests(x):
  s = '*'*(x + 100000)  # generate some big string
  
  with lock:
    digests[x] = hashlib.sha256(f'{s}'.encode()).hexdigest()

tic = time.time()
ThreadPool(20).map(compute_digests, range(1000))
toc = time.time()
print(f'Computation in locked area: {toc - tic}s')



# B) computation is outside of the lock statement
lock = threading.Lock()

digests = {}

def compute_digests(x):
  s = '*'*(x + 100000)  # generate some big string
  digest = hashlib.sha256(f'{s}'.encode()).hexdigest()
  
  with lock:
    digests[x] = digest

tic = time.time()
ThreadPool(20).map(compute_digests, range(1000))
toc = time.time()
print(f'Computation outside of locked area: {toc - tic}s')

结果是：

Computation in locked area: 0.41937875747680664s
Computation outside of locked area: 0.10702204704284668s

换句话说，选项B）更快。考虑到我们将昂贵的计算移到了锁定的代码块之外，这似乎很直观，但是，根据我的阅读，Python 无论如何都是 单线程 和 ThreadPool 只给出并行工作的外观 - 而实际上任何时刻只有一次计算运行。换句话说，我希望全局解释器锁成为瓶颈，但不知何故，版本 B) 有很大的加速！

所以问题是，加速来自哪里？这与 sha256 的实现有关吗（可能在某处休眠）？

Answer 1

Python 不是单线程的。它像任何 C++ 或 Java 代码一样使用普通系统线程。不同之处在于global interpreter lock (GIL) 可以通过hashlib 等内部C 代码释放，而运行ning 纯Python 代码强制一次执行单个线程。

在这种情况下，解释器可以自由运行不同的代码，但你强迫它不使用锁。

Python 和线程：为什么将计算提取到锁定区域之外可以加快代码速度？

Python and threading: Why is extracting the computation outside of the locked area speeding up the code?

python

multithreading

locking

sha256

hashlib