线程混淆中的共享内存

Question

我现在使用 python 大约一年了，对它相当熟悉。虽然我对线程很陌生，并且对数据线程共享的内容有些困惑。

我一直在网上阅读资料，这些资料似乎都同意线程共享相同的内存 space。尽管试图向自己证明这一点，但我似乎对这种分享的运作方式有错误的理解。

我写了一个简短的脚本，只将 1 添加到局部变量 3 次。我使用相同的函数一次创建两个线程。我本以为，由于共享内存，一个线程中的 X 变量也会在它休眠时增加，因为另一个线程增加了它自己的 X，反之亦然。因此，在线程一的第二个循环（其中 x=2 而线程二处于睡眠状态）之后，我认为线程二会在 x = 2 而不是 x = 1 的情况下退出睡眠。尽管正如打印语句的顺序所暗示的那样，线程之间不共享变量。

我的问题是，如果您有多个线程运行同时使用同一个函数，那么在整个程序中每个线程中的变量每次都会保持独立运行（假设没有定义全局变量)?那么这个共享内存到底是什么意思？

任何有关此问题的指导（或一般线程建议）将不胜感激。


import threading 
from time import sleep 

def increase(x):
    for i in range(3):
        print(f"[{threading.currentThread().getName()}] X is {x}")
        x += 1
        print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
        sleep(0.5)
        print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
    return x 


def main():
    x = 0 
    first = threading.Thread(name="Thread One", target=increase,args=([x]))
    second = threading.Thread(name="Thread Two", target=increase,args=([x]))
    
    first.start()
    second.start()
    
    first.join()
    second.join()  

       
    
if __name__ == "__main__":
    main()

结果是：


[Thread One] X is 0
[Thread One] X is now 1 after increase
[Thread Two] X is 0
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] X is 1
[Thread One] X is now 2 after increase

[Thread Two] X is 1
[Thread Two] X is now 2 after increase
[Thread One] X is now 2 after sleep[Thread Two] X is now 2 after sleep
[Thread Two] X is 2
[Thread Two] X is now 3 after increase

[Thread One] X is 2
[Thread One] X is now 3 after increase
[Thread One] X is now 3 after sleep[Thread Two] X is now 3 after sleep

Answer 1

在您的例子中，x 由函数参数共享为副本而不是引用。如果你想增加你的计数器，你必须将它封装在 class.

例如：

import threading 
from time import sleep 

class foo:
    x = 0

def increase(foo):
    for i in range(3):
        print(f"[{threading.currentThread().getName()}] X is {foo.x}")
        foo.x += 1
        print(f"[{threading.currentThread().getName()}] X is now {foo.x} after increase")
        sleep(0.5)
        print(f"[{threading.currentThread().getName()}] X is now {foo.x} after sleep")
    return foo.x 

def main():
    x = foo() 
    first = threading.Thread(name="Thread One", target=increase,args=([x]))
    second = threading.Thread(name="Thread Two", target=increase,args=([x]))
    
    first.start()
    second.start()
    
    first.join()
    second.join()  
    
if __name__ == "__main__":
    main()

注意： Python 线程是特定的。你可以看看这个视频https://www.youtube.com/watch?v=Obt-vMVdM8s

------------编辑------------

更准确地说。在您的情况下，x 是一个 int，因此它会在每次函数调用时被复制。无论是 string 还是 float.

都具有相同的行为

没有线程你将有相同的行为：

def increase(x):
    for i in range(3):
        print(x)
        x += 1
    return x

x = 0

increase(x)
assert x == 0

x += 1

increase(x)
assert x == 1

Answer 2

你说内存是共享的是正确的，但它的复杂性更深。您感到困惑的是不可变类型与可变类型。您可以了解更多 here。我删除了 for 循环，因为它变得混乱：

import threading
from time import sleep


def increase(x):
    print(f"[{threading.currentThread().getName()}] address of x: {hex(id(x))} ")

    print(f"[{threading.currentThread().getName()}] X is {x}")
    x += 1
    print(f"[{threading.currentThread().getName()}] address of x after increment: {hex(id(x))} ")
    print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
    sleep(0.5)
    print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
    print(f"[{threading.currentThread().getName()}] address of x after sleep: {hex(id(x))} ")
    return x

def main():
    x = 0
    first = threading.Thread(name="Thread One", target=increase, args=([x]))
    second = threading.Thread(name="Thread Two", target=increase, args=([x]))

    first.start()
    second.start()

    first.join()
    second.join()


if __name__ == "__main__":
    main()

我在这里所做的是在线程中打印 x 的地址。输出：

[Thread One] address of x: 0x7ffbbebb7c20 
[Thread One] X is 0
[Thread One] address of x after increment: 0x7ffbbebb7c40 
[Thread One] X is now 1 after increase
[Thread Two] address of x: 0x7ffbbebb7c20 
[Thread Two] X is 0
[Thread Two] address of x after increment: 0x7ffbbebb7c40 
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] address of x after sleep: 0x7ffbbebb7c40 

[Thread Two] address of x after sleep: 0x7ffbbebb7c40

你会注意到当我刚刚读取 x 时的第一行打印地址是 0x7ffbbebb7c20 在更新它之后线程 1 和 2 得到不同的地址：0x7ffbbebb7c40。现在它们都获得了相同的地址，因为 python 试图降低内存占用。您可以找到更多相关信息 here 但出于我们的目的，该函数获取相同的变量来读取，一旦您尝试写入或更新该变量，就会为该线程创建一个副本。只有当你使用不可变类型（int、字符串、实例等）时才会发生这种情况，如果你传递了像 dict:

这样的可变类型



import threading
from time import sleep


def increase(test_var):
    print(f"[{threading.currentThread().getName()}] Address of test_var: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key']: {hex(id(test_var['key']))}")
    print(f"[{threading.currentThread().getName()}] test_var['key'] is {test_var['key']}")
    test_var['key'] += 1
    print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after increase")
    print(f"[{threading.currentThread().getName()}] Address of test_var after increment: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after increment: {hex(id(test_var['key']))}")
    sleep(0.5)
    print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after sleep")
    print(f"[{threading.currentThread().getName()}] Address of test_var after sleep: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after sleep: {hex(id(test_var['key']))}")
    return test_var

def main():
    test_var = {'key': 0}
    first = threading.Thread(name="Thread One", target=increase, args=([test_var]))
    second = threading.Thread(name="Thread Two", target=increase, args=([test_var]))

    first.start()
    second.start()

    first.join()
    second.join()


if __name__ == "__main__":
    main()

输出是你所期望的：

[Thread One] Address of test_var: 0x22216509a98
[Thread One] Address of test_var['key']: 0x7ffbaf7d7c20
[Thread One] test_var['key'] is 0
[Thread One] test_var['key'] is now 1 after increase
[Thread One] Address of test_var after increment: 0x22216509a98
[Thread One] Address of test_var['key'] after increment: 0x7ffbaf7d7c40
[Thread Two] Address of test_var: 0x22216509a98
[Thread Two] Address of test_var['key']: 0x7ffbaf7d7c40
[Thread Two] test_var['key'] is 1
[Thread Two] test_var['key'] is now 2 after increase
[Thread Two] Address of test_var after increment: 0x22216509a98
[Thread Two] Address of test_var['key'] after increment: 0x7ffbaf7d7c60
[Thread Two] test_var['key'] is now 2 after sleep
[Thread Two] Address of test_var after sleep: 0x22216509a98
[Thread Two] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60
[Thread One] test_var['key'] is now 2 after sleep
[Thread One] Address of test_var after sleep: 0x22216509a98
[Thread One] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60

注意 test_var (0x22216509a98) 的地址如何在线程之间不改变，因为它是可变的并且可以跨线程共享。

Answer 3

您接受的答案没有直接回答这个问题：

if you have multiple threads running at once using the same function, will the variables in each thread be kept separate?

“本地”不仅表示此 函数的本地， 还表示此函数的本地 调用。

函数的参数和局部变量的值存储在激活记录中。每次调用函数时，一个新的激活记录 被创建，当函数 returns 时，该激活记录被销毁。

这意味着，increase(x) 函数中的 x 参数在每次函数调用中都是 不同的变量 。如果一个函数递归调用自身，那么每次递归调用时args和locals是不同的变量，如果函数在多个线程中调用，那么args和locals在每个线程中都是不同的变量。

I have been reading through stuff online which all seem to agree that threads share the same memory space.

完全正确，但参数或局部变量在内存中不是确定的位置。 global 是内存中的一个确定位置。所以，如果你有一些 global g，每个线程都会同意 g 具有相同的值。而且，一个 Python 对象，只要它存在，就会占据一个确定的位置，所以每个引用同一个对象的线程都会看到它处于相同的状态。但是，局部变量在声明它的函数的每次激活中占据不同的内存位置。

不共享局部变量和参数。它们不会在函数的递归调用之间共享，也不会被来自不同线程的调用共享。

线程混淆中的共享内存

Shared Memory in Threads Confusion

python

multithreading