线程混淆中的共享内存

Shared Memory in Threads Confusion

我现在使用 python 大约一年了,对它相当熟悉。虽然我对线程很陌生,并且对数据线程共享的内容有些困惑。

我一直在网上阅读资料,这些资料似乎都同意线程共享相同的内存 space。尽管试图向自己证明这一点,但我似乎对这种分享的运作方式有错误的理解。

我写了一个简短的脚本,只将 1 添加到局部变量 3 次。我使用相同的函数一次创建两个线程。我本以为,由于共享内存,一个线程中的 X 变量也会在它休眠时增加,因为另一个线程增加了它自己的 X,反之亦然。因此,在线程一的第二个循环(其中 x=2 而线程二处于睡眠状态)之后,我认为线程二会在 x = 2 而不是 x = 1 的情况下退出睡眠。尽管正如打印语句的顺序所暗示的那样,线程之间不共享变量。

我的问题是,如果您有多个线程 运行 同时使用同一个函数,那么在整个程序中每个线程中的变量每次都会保持独立 运行 (假设没有定义全局变量)?那么这个共享内存到底是什么意思?

任何有关此问题的指导(或一般线程建议)将不胜感激。


import threading 
from time import sleep 

def increase(x):
    for i in range(3):
        print(f"[{threading.currentThread().getName()}] X is {x}")
        x += 1
        print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
        sleep(0.5)
        print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
    return x 


def main():
    x = 0 
    first = threading.Thread(name="Thread One", target=increase,args=([x]))
    second = threading.Thread(name="Thread Two", target=increase,args=([x]))
    
    first.start()
    second.start()
    
    first.join()
    second.join()  

       
    
if __name__ == "__main__":
    main()

结果是:


[Thread One] X is 0
[Thread One] X is now 1 after increase
[Thread Two] X is 0
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] X is 1
[Thread One] X is now 2 after increase

[Thread Two] X is 1
[Thread Two] X is now 2 after increase
[Thread One] X is now 2 after sleep[Thread Two] X is now 2 after sleep
[Thread Two] X is 2
[Thread Two] X is now 3 after increase

[Thread One] X is 2
[Thread One] X is now 3 after increase
[Thread One] X is now 3 after sleep[Thread Two] X is now 3 after sleep

在您的例子中,x 由函数参数共享为副本而不是引用。 如果你想增加你的计数器,你必须将它封装在 class.

例如:

import threading 
from time import sleep 

class foo:
    x = 0

def increase(foo):
    for i in range(3):
        print(f"[{threading.currentThread().getName()}] X is {foo.x}")
        foo.x += 1
        print(f"[{threading.currentThread().getName()}] X is now {foo.x} after increase")
        sleep(0.5)
        print(f"[{threading.currentThread().getName()}] X is now {foo.x} after sleep")
    return foo.x 

def main():
    x = foo() 
    first = threading.Thread(name="Thread One", target=increase,args=([x]))
    second = threading.Thread(name="Thread Two", target=increase,args=([x]))
    
    first.start()
    second.start()
    
    first.join()
    second.join()  
    
if __name__ == "__main__":
    main()

注意: Python 线程是特定的。你可以看看这个视频https://www.youtube.com/watch?v=Obt-vMVdM8s

------------编辑------------

更准确地说。在您的情况下,x 是一个 int,因此它会在每次函数调用时被复制。无论是 string 还是 float.

都具有相同的行为

没有线程你将有相同的行为:

def increase(x):
    for i in range(3):
        print(x)
        x += 1
    return x

x = 0

increase(x)
assert x == 0

x += 1

increase(x)
assert x == 1

你说内存是共享的是正确的,但它的复杂性更深。您感到困惑的是不可变类型与可变类型。您可以了解更多 here。我删除了 for 循环,因为它变得混乱:

import threading
from time import sleep


def increase(x):
    print(f"[{threading.currentThread().getName()}] address of x: {hex(id(x))} ")

    print(f"[{threading.currentThread().getName()}] X is {x}")
    x += 1
    print(f"[{threading.currentThread().getName()}] address of x after increment: {hex(id(x))} ")
    print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
    sleep(0.5)
    print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
    print(f"[{threading.currentThread().getName()}] address of x after sleep: {hex(id(x))} ")
    return x

def main():
    x = 0
    first = threading.Thread(name="Thread One", target=increase, args=([x]))
    second = threading.Thread(name="Thread Two", target=increase, args=([x]))

    first.start()
    second.start()

    first.join()
    second.join()


if __name__ == "__main__":
    main()

我在这里所做的是在线程中打印 x 的地址。输出:

[Thread One] address of x: 0x7ffbbebb7c20 
[Thread One] X is 0
[Thread One] address of x after increment: 0x7ffbbebb7c40 
[Thread One] X is now 1 after increase
[Thread Two] address of x: 0x7ffbbebb7c20 
[Thread Two] X is 0
[Thread Two] address of x after increment: 0x7ffbbebb7c40 
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] address of x after sleep: 0x7ffbbebb7c40 

[Thread Two] address of x after sleep: 0x7ffbbebb7c40 

你会注意到当我刚刚读取 x 时的第一行打印地址是 0x7ffbbebb7c20 在更新它之后线程 1 和 2 得到不同的地址:0x7ffbbebb7c40。现在它们都获得了相同的地址,因为 python 试图降低内存占用。您可以找到更多相关信息 here 但出于我们的目的,该函数获取相同的变量来读取,一旦您尝试写入或更新该变量,就会为该线程创建一个副本。只有当你使用不可变类型(int、字符串、实例等)时才会发生这种情况,如果你传递了像 dict:

这样的可变类型


import threading
from time import sleep


def increase(test_var):
    print(f"[{threading.currentThread().getName()}] Address of test_var: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key']: {hex(id(test_var['key']))}")
    print(f"[{threading.currentThread().getName()}] test_var['key'] is {test_var['key']}")
    test_var['key'] += 1
    print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after increase")
    print(f"[{threading.currentThread().getName()}] Address of test_var after increment: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after increment: {hex(id(test_var['key']))}")
    sleep(0.5)
    print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after sleep")
    print(f"[{threading.currentThread().getName()}] Address of test_var after sleep: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after sleep: {hex(id(test_var['key']))}")
    return test_var

def main():
    test_var = {'key': 0}
    first = threading.Thread(name="Thread One", target=increase, args=([test_var]))
    second = threading.Thread(name="Thread Two", target=increase, args=([test_var]))

    first.start()
    second.start()

    first.join()
    second.join()


if __name__ == "__main__":
    main()

输出是你所期望的:

[Thread One] Address of test_var: 0x22216509a98
[Thread One] Address of test_var['key']: 0x7ffbaf7d7c20
[Thread One] test_var['key'] is 0
[Thread One] test_var['key'] is now 1 after increase
[Thread One] Address of test_var after increment: 0x22216509a98
[Thread One] Address of test_var['key'] after increment: 0x7ffbaf7d7c40
[Thread Two] Address of test_var: 0x22216509a98
[Thread Two] Address of test_var['key']: 0x7ffbaf7d7c40
[Thread Two] test_var['key'] is 1
[Thread Two] test_var['key'] is now 2 after increase
[Thread Two] Address of test_var after increment: 0x22216509a98
[Thread Two] Address of test_var['key'] after increment: 0x7ffbaf7d7c60
[Thread Two] test_var['key'] is now 2 after sleep
[Thread Two] Address of test_var after sleep: 0x22216509a98
[Thread Two] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60
[Thread One] test_var['key'] is now 2 after sleep
[Thread One] Address of test_var after sleep: 0x22216509a98
[Thread One] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60

注意 test_var (0x22216509a98) 的地址如何在线程之间不改变,因为它是可变的并且可以跨线程共享。

您接受的答案没有直接回答这个问题:

if you have multiple threads running at once using the same function, will the variables in each thread be kept separate?

“本地”不仅表示此 函数的本地, 还表示此函数的本地 调用。

函数的参数和局部变量的值存储在激活记录中。每次调用函数时,一个新的激活记录 被创建,当函数 returns 时,该激活记录被销毁。

这意味着,increase(x) 函数中的 x 参数在每次函数调用中都是 不同的变量 。如果一个函数递归调用自身,那么每次递归调用时args和locals是不同的变量,如果函数在多个线程中调用,那么args和locals在每个线程中都是不同的变量。

I have been reading through stuff online which all seem to agree that threads share the same memory space.

完全正确,但参数或局部变量在内存中不是确定的位置。 global 是内存中的一个确定位置。所以,如果你有一些 global g,每个线程都会同意 g 具有相同的值。而且,一个 Python 对象,只要它存在,就会占据一个确定的位置,所以每个引用同一个对象的线程都会看到它处于相同的状态。但是,局部变量在声明它的函数的每次激活中占据不同的内存位置。

不共享局部变量和参数。它们不会在函数的递归调用之间共享,也不会被来自不同线程的调用共享。