线程混淆中的共享内存
Shared Memory in Threads Confusion
我现在使用 python 大约一年了,对它相当熟悉。虽然我对线程很陌生,并且对数据线程共享的内容有些困惑。
我一直在网上阅读资料,这些资料似乎都同意线程共享相同的内存 space。尽管试图向自己证明这一点,但我似乎对这种分享的运作方式有错误的理解。
我写了一个简短的脚本,只将 1 添加到局部变量 3 次。我使用相同的函数一次创建两个线程。我本以为,由于共享内存,一个线程中的 X 变量也会在它休眠时增加,因为另一个线程增加了它自己的 X,反之亦然。因此,在线程一的第二个循环(其中 x=2 而线程二处于睡眠状态)之后,我认为线程二会在 x = 2 而不是 x = 1 的情况下退出睡眠。尽管正如打印语句的顺序所暗示的那样,线程之间不共享变量。
我的问题是,如果您有多个线程 运行 同时使用同一个函数,那么在整个程序中每个线程中的变量每次都会保持独立 运行 (假设没有定义全局变量)?那么这个共享内存到底是什么意思?
任何有关此问题的指导(或一般线程建议)将不胜感激。
import threading
from time import sleep
def increase(x):
for i in range(3):
print(f"[{threading.currentThread().getName()}] X is {x}")
x += 1
print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
sleep(0.5)
print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
return x
def main():
x = 0
first = threading.Thread(name="Thread One", target=increase,args=([x]))
second = threading.Thread(name="Thread Two", target=increase,args=([x]))
first.start()
second.start()
first.join()
second.join()
if __name__ == "__main__":
main()
结果是:
[Thread One] X is 0
[Thread One] X is now 1 after increase
[Thread Two] X is 0
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] X is 1
[Thread One] X is now 2 after increase
[Thread Two] X is 1
[Thread Two] X is now 2 after increase
[Thread One] X is now 2 after sleep[Thread Two] X is now 2 after sleep
[Thread Two] X is 2
[Thread Two] X is now 3 after increase
[Thread One] X is 2
[Thread One] X is now 3 after increase
[Thread One] X is now 3 after sleep[Thread Two] X is now 3 after sleep
在您的例子中,x 由函数参数共享为副本而不是引用。
如果你想增加你的计数器,你必须将它封装在 class.
例如:
import threading
from time import sleep
class foo:
x = 0
def increase(foo):
for i in range(3):
print(f"[{threading.currentThread().getName()}] X is {foo.x}")
foo.x += 1
print(f"[{threading.currentThread().getName()}] X is now {foo.x} after increase")
sleep(0.5)
print(f"[{threading.currentThread().getName()}] X is now {foo.x} after sleep")
return foo.x
def main():
x = foo()
first = threading.Thread(name="Thread One", target=increase,args=([x]))
second = threading.Thread(name="Thread Two", target=increase,args=([x]))
first.start()
second.start()
first.join()
second.join()
if __name__ == "__main__":
main()
注意:
Python 线程是特定的。你可以看看这个视频https://www.youtube.com/watch?v=Obt-vMVdM8s
------------编辑------------
更准确地说。在您的情况下,x
是一个 int,因此它会在每次函数调用时被复制。无论是 string 还是 float.
都具有相同的行为
没有线程你将有相同的行为:
def increase(x):
for i in range(3):
print(x)
x += 1
return x
x = 0
increase(x)
assert x == 0
x += 1
increase(x)
assert x == 1
你说内存是共享的是正确的,但它的复杂性更深。您感到困惑的是不可变类型与可变类型。您可以了解更多 here。我删除了 for 循环,因为它变得混乱:
import threading
from time import sleep
def increase(x):
print(f"[{threading.currentThread().getName()}] address of x: {hex(id(x))} ")
print(f"[{threading.currentThread().getName()}] X is {x}")
x += 1
print(f"[{threading.currentThread().getName()}] address of x after increment: {hex(id(x))} ")
print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
sleep(0.5)
print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
print(f"[{threading.currentThread().getName()}] address of x after sleep: {hex(id(x))} ")
return x
def main():
x = 0
first = threading.Thread(name="Thread One", target=increase, args=([x]))
second = threading.Thread(name="Thread Two", target=increase, args=([x]))
first.start()
second.start()
first.join()
second.join()
if __name__ == "__main__":
main()
我在这里所做的是在线程中打印 x
的地址。输出:
[Thread One] address of x: 0x7ffbbebb7c20
[Thread One] X is 0
[Thread One] address of x after increment: 0x7ffbbebb7c40
[Thread One] X is now 1 after increase
[Thread Two] address of x: 0x7ffbbebb7c20
[Thread Two] X is 0
[Thread Two] address of x after increment: 0x7ffbbebb7c40
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] address of x after sleep: 0x7ffbbebb7c40
[Thread Two] address of x after sleep: 0x7ffbbebb7c40
你会注意到当我刚刚读取 x 时的第一行打印地址是 0x7ffbbebb7c20
在更新它之后线程 1 和 2 得到不同的地址:0x7ffbbebb7c40
。现在它们都获得了相同的地址,因为 python 试图降低内存占用。您可以找到更多相关信息 here 但出于我们的目的,该函数获取相同的变量来读取,一旦您尝试写入或更新该变量,就会为该线程创建一个副本。只有当你使用不可变类型(int、字符串、实例等)时才会发生这种情况,如果你传递了像 dict:
这样的可变类型
import threading
from time import sleep
def increase(test_var):
print(f"[{threading.currentThread().getName()}] Address of test_var: {hex(id(test_var))}")
print(f"[{threading.currentThread().getName()}] Address of test_var['key']: {hex(id(test_var['key']))}")
print(f"[{threading.currentThread().getName()}] test_var['key'] is {test_var['key']}")
test_var['key'] += 1
print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after increase")
print(f"[{threading.currentThread().getName()}] Address of test_var after increment: {hex(id(test_var))}")
print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after increment: {hex(id(test_var['key']))}")
sleep(0.5)
print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after sleep")
print(f"[{threading.currentThread().getName()}] Address of test_var after sleep: {hex(id(test_var))}")
print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after sleep: {hex(id(test_var['key']))}")
return test_var
def main():
test_var = {'key': 0}
first = threading.Thread(name="Thread One", target=increase, args=([test_var]))
second = threading.Thread(name="Thread Two", target=increase, args=([test_var]))
first.start()
second.start()
first.join()
second.join()
if __name__ == "__main__":
main()
输出是你所期望的:
[Thread One] Address of test_var: 0x22216509a98
[Thread One] Address of test_var['key']: 0x7ffbaf7d7c20
[Thread One] test_var['key'] is 0
[Thread One] test_var['key'] is now 1 after increase
[Thread One] Address of test_var after increment: 0x22216509a98
[Thread One] Address of test_var['key'] after increment: 0x7ffbaf7d7c40
[Thread Two] Address of test_var: 0x22216509a98
[Thread Two] Address of test_var['key']: 0x7ffbaf7d7c40
[Thread Two] test_var['key'] is 1
[Thread Two] test_var['key'] is now 2 after increase
[Thread Two] Address of test_var after increment: 0x22216509a98
[Thread Two] Address of test_var['key'] after increment: 0x7ffbaf7d7c60
[Thread Two] test_var['key'] is now 2 after sleep
[Thread Two] Address of test_var after sleep: 0x22216509a98
[Thread Two] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60
[Thread One] test_var['key'] is now 2 after sleep
[Thread One] Address of test_var after sleep: 0x22216509a98
[Thread One] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60
注意 test_var (0x22216509a98
) 的地址如何在线程之间不改变,因为它是可变的并且可以跨线程共享。
您接受的答案没有直接回答这个问题:
if you have multiple threads running at once using the same function, will the variables in each thread be kept separate?
“本地”不仅表示此 函数的本地, 还表示此函数的本地 调用。
函数的参数和局部变量的值存储在激活记录中。每次调用函数时,一个新的激活记录 被创建,当函数 returns 时,该激活记录被销毁。
这意味着,increase(x)
函数中的 x
参数在每次函数调用中都是 不同的变量 。如果一个函数递归调用自身,那么每次递归调用时args和locals是不同的变量,如果函数在多个线程中调用,那么args和locals在每个线程中都是不同的变量。
I have been reading through stuff online which all seem to agree that threads share the same memory space.
完全正确,但参数或局部变量在内存中不是确定的位置。 global 是内存中的一个确定位置。所以,如果你有一些 global g
,每个线程都会同意 g
具有相同的值。而且,一个 Python 对象,只要它存在,就会占据一个确定的位置,所以每个引用同一个对象的线程都会看到它处于相同的状态。但是,局部变量在声明它的函数的每次激活中占据不同的内存位置。
不共享局部变量和参数。它们不会在函数的递归调用之间共享,也不会被来自不同线程的调用共享。
我现在使用 python 大约一年了,对它相当熟悉。虽然我对线程很陌生,并且对数据线程共享的内容有些困惑。
我一直在网上阅读资料,这些资料似乎都同意线程共享相同的内存 space。尽管试图向自己证明这一点,但我似乎对这种分享的运作方式有错误的理解。
我写了一个简短的脚本,只将 1 添加到局部变量 3 次。我使用相同的函数一次创建两个线程。我本以为,由于共享内存,一个线程中的 X 变量也会在它休眠时增加,因为另一个线程增加了它自己的 X,反之亦然。因此,在线程一的第二个循环(其中 x=2 而线程二处于睡眠状态)之后,我认为线程二会在 x = 2 而不是 x = 1 的情况下退出睡眠。尽管正如打印语句的顺序所暗示的那样,线程之间不共享变量。
我的问题是,如果您有多个线程 运行 同时使用同一个函数,那么在整个程序中每个线程中的变量每次都会保持独立 运行 (假设没有定义全局变量)?那么这个共享内存到底是什么意思?
任何有关此问题的指导(或一般线程建议)将不胜感激。
import threading
from time import sleep
def increase(x):
for i in range(3):
print(f"[{threading.currentThread().getName()}] X is {x}")
x += 1
print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
sleep(0.5)
print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
return x
def main():
x = 0
first = threading.Thread(name="Thread One", target=increase,args=([x]))
second = threading.Thread(name="Thread Two", target=increase,args=([x]))
first.start()
second.start()
first.join()
second.join()
if __name__ == "__main__":
main()
结果是:
[Thread One] X is 0
[Thread One] X is now 1 after increase
[Thread Two] X is 0
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] X is 1
[Thread One] X is now 2 after increase
[Thread Two] X is 1
[Thread Two] X is now 2 after increase
[Thread One] X is now 2 after sleep[Thread Two] X is now 2 after sleep
[Thread Two] X is 2
[Thread Two] X is now 3 after increase
[Thread One] X is 2
[Thread One] X is now 3 after increase
[Thread One] X is now 3 after sleep[Thread Two] X is now 3 after sleep
在您的例子中,x 由函数参数共享为副本而不是引用。 如果你想增加你的计数器,你必须将它封装在 class.
例如:
import threading
from time import sleep
class foo:
x = 0
def increase(foo):
for i in range(3):
print(f"[{threading.currentThread().getName()}] X is {foo.x}")
foo.x += 1
print(f"[{threading.currentThread().getName()}] X is now {foo.x} after increase")
sleep(0.5)
print(f"[{threading.currentThread().getName()}] X is now {foo.x} after sleep")
return foo.x
def main():
x = foo()
first = threading.Thread(name="Thread One", target=increase,args=([x]))
second = threading.Thread(name="Thread Two", target=increase,args=([x]))
first.start()
second.start()
first.join()
second.join()
if __name__ == "__main__":
main()
注意: Python 线程是特定的。你可以看看这个视频https://www.youtube.com/watch?v=Obt-vMVdM8s
------------编辑------------
更准确地说。在您的情况下,x
是一个 int,因此它会在每次函数调用时被复制。无论是 string 还是 float.
没有线程你将有相同的行为:
def increase(x):
for i in range(3):
print(x)
x += 1
return x
x = 0
increase(x)
assert x == 0
x += 1
increase(x)
assert x == 1
你说内存是共享的是正确的,但它的复杂性更深。您感到困惑的是不可变类型与可变类型。您可以了解更多 here。我删除了 for 循环,因为它变得混乱:
import threading
from time import sleep
def increase(x):
print(f"[{threading.currentThread().getName()}] address of x: {hex(id(x))} ")
print(f"[{threading.currentThread().getName()}] X is {x}")
x += 1
print(f"[{threading.currentThread().getName()}] address of x after increment: {hex(id(x))} ")
print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
sleep(0.5)
print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
print(f"[{threading.currentThread().getName()}] address of x after sleep: {hex(id(x))} ")
return x
def main():
x = 0
first = threading.Thread(name="Thread One", target=increase, args=([x]))
second = threading.Thread(name="Thread Two", target=increase, args=([x]))
first.start()
second.start()
first.join()
second.join()
if __name__ == "__main__":
main()
我在这里所做的是在线程中打印 x
的地址。输出:
[Thread One] address of x: 0x7ffbbebb7c20
[Thread One] X is 0
[Thread One] address of x after increment: 0x7ffbbebb7c40
[Thread One] X is now 1 after increase
[Thread Two] address of x: 0x7ffbbebb7c20
[Thread Two] X is 0
[Thread Two] address of x after increment: 0x7ffbbebb7c40
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] address of x after sleep: 0x7ffbbebb7c40
[Thread Two] address of x after sleep: 0x7ffbbebb7c40
你会注意到当我刚刚读取 x 时的第一行打印地址是 0x7ffbbebb7c20
在更新它之后线程 1 和 2 得到不同的地址:0x7ffbbebb7c40
。现在它们都获得了相同的地址,因为 python 试图降低内存占用。您可以找到更多相关信息 here 但出于我们的目的,该函数获取相同的变量来读取,一旦您尝试写入或更新该变量,就会为该线程创建一个副本。只有当你使用不可变类型(int、字符串、实例等)时才会发生这种情况,如果你传递了像 dict:
import threading
from time import sleep
def increase(test_var):
print(f"[{threading.currentThread().getName()}] Address of test_var: {hex(id(test_var))}")
print(f"[{threading.currentThread().getName()}] Address of test_var['key']: {hex(id(test_var['key']))}")
print(f"[{threading.currentThread().getName()}] test_var['key'] is {test_var['key']}")
test_var['key'] += 1
print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after increase")
print(f"[{threading.currentThread().getName()}] Address of test_var after increment: {hex(id(test_var))}")
print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after increment: {hex(id(test_var['key']))}")
sleep(0.5)
print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after sleep")
print(f"[{threading.currentThread().getName()}] Address of test_var after sleep: {hex(id(test_var))}")
print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after sleep: {hex(id(test_var['key']))}")
return test_var
def main():
test_var = {'key': 0}
first = threading.Thread(name="Thread One", target=increase, args=([test_var]))
second = threading.Thread(name="Thread Two", target=increase, args=([test_var]))
first.start()
second.start()
first.join()
second.join()
if __name__ == "__main__":
main()
输出是你所期望的:
[Thread One] Address of test_var: 0x22216509a98
[Thread One] Address of test_var['key']: 0x7ffbaf7d7c20
[Thread One] test_var['key'] is 0
[Thread One] test_var['key'] is now 1 after increase
[Thread One] Address of test_var after increment: 0x22216509a98
[Thread One] Address of test_var['key'] after increment: 0x7ffbaf7d7c40
[Thread Two] Address of test_var: 0x22216509a98
[Thread Two] Address of test_var['key']: 0x7ffbaf7d7c40
[Thread Two] test_var['key'] is 1
[Thread Two] test_var['key'] is now 2 after increase
[Thread Two] Address of test_var after increment: 0x22216509a98
[Thread Two] Address of test_var['key'] after increment: 0x7ffbaf7d7c60
[Thread Two] test_var['key'] is now 2 after sleep
[Thread Two] Address of test_var after sleep: 0x22216509a98
[Thread Two] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60
[Thread One] test_var['key'] is now 2 after sleep
[Thread One] Address of test_var after sleep: 0x22216509a98
[Thread One] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60
注意 test_var (0x22216509a98
) 的地址如何在线程之间不改变,因为它是可变的并且可以跨线程共享。
您接受的答案没有直接回答这个问题:
if you have multiple threads running at once using the same function, will the variables in each thread be kept separate?
“本地”不仅表示此 函数的本地, 还表示此函数的本地 调用。
函数的参数和局部变量的值存储在激活记录中。每次调用函数时,一个新的激活记录 被创建,当函数 returns 时,该激活记录被销毁。
这意味着,increase(x)
函数中的 x
参数在每次函数调用中都是 不同的变量 。如果一个函数递归调用自身,那么每次递归调用时args和locals是不同的变量,如果函数在多个线程中调用,那么args和locals在每个线程中都是不同的变量。
I have been reading through stuff online which all seem to agree that threads share the same memory space.
完全正确,但参数或局部变量在内存中不是确定的位置。 global 是内存中的一个确定位置。所以,如果你有一些 global g
,每个线程都会同意 g
具有相同的值。而且,一个 Python 对象,只要它存在,就会占据一个确定的位置,所以每个引用同一个对象的线程都会看到它处于相同的状态。但是,局部变量在声明它的函数的每次激活中占据不同的内存位置。
不共享局部变量和参数。它们不会在函数的递归调用之间共享,也不会被来自不同线程的调用共享。