执行锁定的代码应该比未锁定的对应代码花费更多的时间
Execution of code which is locked should take more time than its unlocked counterpart
我正在尝试 python 线程和锁定。所以,我创建了 2 classes。这两个 classes 都使用线程来递增和递减 class 级别变量 'ref'.
在 ThreadUnsafeClass 中,我在递增和递减之前没有使用锁。
在 ThreadSafeClass 中,我在递增和递减之前使用锁。
我的假设是,由于锁定会强制某些线程等待,因此在 ThreadSafeClass 情况下应该需要更多时间。
结果表明 ThreadSafeClass 更快。
这是我的代码 (python 2.7)
import threading
import time
class ThreadUnsafeClass(object):
ref = 0
def __init__(self, count_tot=10000):
self.all_threads = []
self.count_tot = count_tot
ThreadUnsafeClass.ref = 0
def inc_ref(self):
time.sleep(0.1)
for i in xrange(0, self.count_tot):
ThreadUnsafeClass.ref += 1
def dec_ref(self):
time.sleep(0.1)
for i in xrange(0, self.count_tot):
ThreadUnsafeClass.ref -= 1
def compute_ref_value(self):
start_time = time.time()
for i in xrange(0, 50):
t1 = threading.Thread(target=self.inc_ref, args=())
t2 = threading.Thread(target=self.dec_ref, args=())
t1.start()
t2.start()
self.all_threads.append(t1)
self.all_threads.append(t2)
for t in self.all_threads:
t.join()
print time.time() - start_time, " -> ",
return ThreadUnsafeClass.ref
class ThreadSafeClass(object):
ref = 0
def __init__(self, count_tot=10000):
self.all_threads = []
self.count_tot = count_tot
ThreadUnsafeClass.ref = 0
self.lock = threading.Lock()
def inc_ref(self):
time.sleep(0.1)
self.lock.acquire()
for i in xrange(0, self.count_tot):
ThreadUnsafeClass.ref += 1
self.lock.release()
def dec_ref(self):
time.sleep(0.1)
self.lock.acquire()
for i in xrange(0, self.count_tot):
ThreadUnsafeClass.ref -= 1
self.lock.release()
def compute_ref_value(self):
start_time = time.time()
for i in xrange(0, 50):
t1 = threading.Thread(target=self.inc_ref, args=())
t2 = threading.Thread(target=self.dec_ref, args=())
t1.start()
t2.start()
self.all_threads.append(t1)
self.all_threads.append(t2)
for t in self.all_threads:
t.join()
print time.time() - start_time, " -> ",
return ThreadUnsafeClass.ref
thread_unsafe_class = ThreadUnsafeClass(100000)
print "Value from un-safe threading ",
thread_unsafe_class.compute_ref_value()
thread_safe_class = ThreadSafeClass(100000)
print "Value from safe threading ", thread_safe_class.compute_ref_value()
这是我的结果:
来自不安全线程的值 3.54868483543 -> 30653
来自安全线程的值 2.28372502327 -> 0
请帮我理解为什么锁定方式更快!
我相信答案是,通过锁定代码的执行方式,您实际上避免了线程和缓存抖动,这使得它更快,因为每个线程的循环都可以在没有任何其他硬件资源争用的情况下完成。这并不是真正的同类比较,而是将锁移入循环而不是循环外:
def inc_ref(self):
time.sleep(0.1)
for i in xrange(0, self.count_tot):
self.lock.acquire()
ThreadUnsafeClass.ref += 1
self.lock.release()
def dec_ref(self):
time.sleep(0.1)
for i in xrange(0, self.count_tot):
self.lock.acquire()
ThreadUnsafeClass.ref -= 1
self.lock.release()
我发现执行时间急剧增加(如您所料)。
为了进一步检验这一理论,我使用了您的代码并添加了一些更详细的计时来准确捕获 increment/decrement 操作与锁定所花费的时间:
import threading
import time
import operator
class ThreadUnsafeClass(object):
ref = 0
def __init__(self, count_tot=10000):
self.all_threads = []
self.count_tot = count_tot
ThreadUnsafeClass.ref = 0
def inc_ref(self, ndx):
time.sleep(0.1)
ref_time = 0
for i in xrange(0, self.count_tot):
op_start = time.time()
ThreadUnsafeClass.ref += 1
ref_time += time.time() - op_start
self.op_times[ndx] = ref_time
def dec_ref(self, ndx):
time.sleep(0.1)
ref_time = 0
for i in xrange(0, self.count_tot):
op_start = time.time()
ThreadUnsafeClass.ref -= 1
ref_time += time.time() - op_start
self.op_times[ndx] = ref_time
def compute_ref_value(self):
start_time = time.time()
self.op_times = [0]*100
for i in xrange(0, 50):
t1 = threading.Thread(target=self.inc_ref, args=(i*2,))
t2 = threading.Thread(target=self.dec_ref, args=(i*2+1,))
t1.start()
t2.start()
self.all_threads.append(t1)
self.all_threads.append(t2)
for t in self.all_threads:
t.join()
op_total = reduce(operator.add, self.op_times)
print time.time() - start_time, op_total, " -> ",
return ThreadUnsafeClass.ref
class ThreadSafeClass(object):
ref = 0
def __init__(self, count_tot=10000):
self.all_threads = []
self.count_tot = count_tot
ThreadUnsafeClass.ref = 0
self.lock = threading.Lock()
def inc_ref(self, ndx):
time.sleep(0.1)
lock_start = time.time()
self.lock.acquire()
lock_time = time.time() - lock_start
ref_time = 0
for i in xrange(0, self.count_tot):
op_start = time.time()
ThreadUnsafeClass.ref += 1
ref_time += time.time() - op_start
self.lock.release()
self.op_times[ndx] = ref_time
self.lock_times[ndx] = lock_time
def dec_ref(self, ndx):
time.sleep(0.1)
lock_start = time.time()
self.lock.acquire()
lock_time = time.time() - lock_start
ref_time = 0
for i in xrange(0, self.count_tot):
op_start = time.time()
ThreadUnsafeClass.ref -= 1
ref_time += time.time() - op_start
self.lock.release()
self.op_times[ndx] = ref_time
self.lock_times[ndx] = lock_time
def compute_ref_value(self):
start_time = time.time()
self.op_times = [0]*100
self.lock_times = [0]*100
for i in xrange(0, 50):
t1 = threading.Thread(target=self.inc_ref, args=(i*2,))
t2 = threading.Thread(target=self.dec_ref, args=(i*2+1,))
t1.start()
t2.start()
self.all_threads.append(t1)
self.all_threads.append(t2)
for t in self.all_threads:
t.join()
op_total = reduce(operator.add, self.op_times)
lock_total = reduce(operator.add, self.lock_times)
print time.time() - start_time, op_total, lock_total, " -> ",
return ThreadUnsafeClass.ref
thread_unsafe_class = ThreadUnsafeClass(100000)
print "Value from un-safe threading ", thread_unsafe_class.compute_ref_value()
thread_safe_class = ThreadSafeClass(100000)
print "Value from safe threading ", thread_safe_class.compute_ref_value()
输出为:
Value from un-safe threading 6.93944501877 297.449399471 -> 13057
Value from safe threading 4.08318996429 2.6313662529 197.359120131 -> 0
显示在非锁定情况下,仅用于递增和递减(跨所有线程)的累积时间几乎为 300 秒,但在锁定情况下不到 3 秒。锁定案例确实花费了将近 200(累积)秒来为所有线程获取锁,但锁定的总时间和 increment/decrement 在这种情况下仍然更少。
发生抖动是因为当您有多个线程 运行 在多个 CPU 上访问共享内存时(现在几乎每个系统都有),硬件必须协调访问到每个 CPU 之间的共享内存,并且当您从不同来源同时多次重复访问同一内存(或同一缓存行中的内存)时,CPU 最终会花费一笔不小的费用等待彼此的时间
当你引入锁定时,你会花时间等待锁定,但在锁定中每个 thread/CPU 都可以独占访问共享内存,因此没有额外的开销来协调来自多个 CPUs.
我正在尝试 python 线程和锁定。所以,我创建了 2 classes。这两个 classes 都使用线程来递增和递减 class 级别变量 'ref'.
在 ThreadUnsafeClass 中,我在递增和递减之前没有使用锁。
在 ThreadSafeClass 中,我在递增和递减之前使用锁。
我的假设是,由于锁定会强制某些线程等待,因此在 ThreadSafeClass 情况下应该需要更多时间。
结果表明 ThreadSafeClass 更快。
这是我的代码 (python 2.7)
import threading
import time
class ThreadUnsafeClass(object):
ref = 0
def __init__(self, count_tot=10000):
self.all_threads = []
self.count_tot = count_tot
ThreadUnsafeClass.ref = 0
def inc_ref(self):
time.sleep(0.1)
for i in xrange(0, self.count_tot):
ThreadUnsafeClass.ref += 1
def dec_ref(self):
time.sleep(0.1)
for i in xrange(0, self.count_tot):
ThreadUnsafeClass.ref -= 1
def compute_ref_value(self):
start_time = time.time()
for i in xrange(0, 50):
t1 = threading.Thread(target=self.inc_ref, args=())
t2 = threading.Thread(target=self.dec_ref, args=())
t1.start()
t2.start()
self.all_threads.append(t1)
self.all_threads.append(t2)
for t in self.all_threads:
t.join()
print time.time() - start_time, " -> ",
return ThreadUnsafeClass.ref
class ThreadSafeClass(object):
ref = 0
def __init__(self, count_tot=10000):
self.all_threads = []
self.count_tot = count_tot
ThreadUnsafeClass.ref = 0
self.lock = threading.Lock()
def inc_ref(self):
time.sleep(0.1)
self.lock.acquire()
for i in xrange(0, self.count_tot):
ThreadUnsafeClass.ref += 1
self.lock.release()
def dec_ref(self):
time.sleep(0.1)
self.lock.acquire()
for i in xrange(0, self.count_tot):
ThreadUnsafeClass.ref -= 1
self.lock.release()
def compute_ref_value(self):
start_time = time.time()
for i in xrange(0, 50):
t1 = threading.Thread(target=self.inc_ref, args=())
t2 = threading.Thread(target=self.dec_ref, args=())
t1.start()
t2.start()
self.all_threads.append(t1)
self.all_threads.append(t2)
for t in self.all_threads:
t.join()
print time.time() - start_time, " -> ",
return ThreadUnsafeClass.ref
thread_unsafe_class = ThreadUnsafeClass(100000)
print "Value from un-safe threading ",
thread_unsafe_class.compute_ref_value()
thread_safe_class = ThreadSafeClass(100000)
print "Value from safe threading ", thread_safe_class.compute_ref_value()
这是我的结果:
来自不安全线程的值 3.54868483543 -> 30653
来自安全线程的值 2.28372502327 -> 0
请帮我理解为什么锁定方式更快!
我相信答案是,通过锁定代码的执行方式,您实际上避免了线程和缓存抖动,这使得它更快,因为每个线程的循环都可以在没有任何其他硬件资源争用的情况下完成。这并不是真正的同类比较,而是将锁移入循环而不是循环外:
def inc_ref(self):
time.sleep(0.1)
for i in xrange(0, self.count_tot):
self.lock.acquire()
ThreadUnsafeClass.ref += 1
self.lock.release()
def dec_ref(self):
time.sleep(0.1)
for i in xrange(0, self.count_tot):
self.lock.acquire()
ThreadUnsafeClass.ref -= 1
self.lock.release()
我发现执行时间急剧增加(如您所料)。
为了进一步检验这一理论,我使用了您的代码并添加了一些更详细的计时来准确捕获 increment/decrement 操作与锁定所花费的时间:
import threading
import time
import operator
class ThreadUnsafeClass(object):
ref = 0
def __init__(self, count_tot=10000):
self.all_threads = []
self.count_tot = count_tot
ThreadUnsafeClass.ref = 0
def inc_ref(self, ndx):
time.sleep(0.1)
ref_time = 0
for i in xrange(0, self.count_tot):
op_start = time.time()
ThreadUnsafeClass.ref += 1
ref_time += time.time() - op_start
self.op_times[ndx] = ref_time
def dec_ref(self, ndx):
time.sleep(0.1)
ref_time = 0
for i in xrange(0, self.count_tot):
op_start = time.time()
ThreadUnsafeClass.ref -= 1
ref_time += time.time() - op_start
self.op_times[ndx] = ref_time
def compute_ref_value(self):
start_time = time.time()
self.op_times = [0]*100
for i in xrange(0, 50):
t1 = threading.Thread(target=self.inc_ref, args=(i*2,))
t2 = threading.Thread(target=self.dec_ref, args=(i*2+1,))
t1.start()
t2.start()
self.all_threads.append(t1)
self.all_threads.append(t2)
for t in self.all_threads:
t.join()
op_total = reduce(operator.add, self.op_times)
print time.time() - start_time, op_total, " -> ",
return ThreadUnsafeClass.ref
class ThreadSafeClass(object):
ref = 0
def __init__(self, count_tot=10000):
self.all_threads = []
self.count_tot = count_tot
ThreadUnsafeClass.ref = 0
self.lock = threading.Lock()
def inc_ref(self, ndx):
time.sleep(0.1)
lock_start = time.time()
self.lock.acquire()
lock_time = time.time() - lock_start
ref_time = 0
for i in xrange(0, self.count_tot):
op_start = time.time()
ThreadUnsafeClass.ref += 1
ref_time += time.time() - op_start
self.lock.release()
self.op_times[ndx] = ref_time
self.lock_times[ndx] = lock_time
def dec_ref(self, ndx):
time.sleep(0.1)
lock_start = time.time()
self.lock.acquire()
lock_time = time.time() - lock_start
ref_time = 0
for i in xrange(0, self.count_tot):
op_start = time.time()
ThreadUnsafeClass.ref -= 1
ref_time += time.time() - op_start
self.lock.release()
self.op_times[ndx] = ref_time
self.lock_times[ndx] = lock_time
def compute_ref_value(self):
start_time = time.time()
self.op_times = [0]*100
self.lock_times = [0]*100
for i in xrange(0, 50):
t1 = threading.Thread(target=self.inc_ref, args=(i*2,))
t2 = threading.Thread(target=self.dec_ref, args=(i*2+1,))
t1.start()
t2.start()
self.all_threads.append(t1)
self.all_threads.append(t2)
for t in self.all_threads:
t.join()
op_total = reduce(operator.add, self.op_times)
lock_total = reduce(operator.add, self.lock_times)
print time.time() - start_time, op_total, lock_total, " -> ",
return ThreadUnsafeClass.ref
thread_unsafe_class = ThreadUnsafeClass(100000)
print "Value from un-safe threading ", thread_unsafe_class.compute_ref_value()
thread_safe_class = ThreadSafeClass(100000)
print "Value from safe threading ", thread_safe_class.compute_ref_value()
输出为:
Value from un-safe threading 6.93944501877 297.449399471 -> 13057
Value from safe threading 4.08318996429 2.6313662529 197.359120131 -> 0
显示在非锁定情况下,仅用于递增和递减(跨所有线程)的累积时间几乎为 300 秒,但在锁定情况下不到 3 秒。锁定案例确实花费了将近 200(累积)秒来为所有线程获取锁,但锁定的总时间和 increment/decrement 在这种情况下仍然更少。
发生抖动是因为当您有多个线程 运行 在多个 CPU 上访问共享内存时(现在几乎每个系统都有),硬件必须协调访问到每个 CPU 之间的共享内存,并且当您从不同来源同时多次重复访问同一内存(或同一缓存行中的内存)时,CPU 最终会花费一笔不小的费用等待彼此的时间
当你引入锁定时,你会花时间等待锁定,但在锁定中每个 thread/CPU 都可以独占访问共享内存,因此没有额外的开销来协调来自多个 CPUs.