Python 中各个线程之间的共享变量

Shared variable among various threads in Python

我需要计算发送到服务器的 post 请求总数。我的脚本为每个包含 post 数据的 JSON 文件使用一个线程。下面是粗略的代码片段。

statistics = 0

def load_from_file(some_arguments, filename):
    data_list = json.loads(open(filename).read())
    url = address + getUrl(filename, config)
    for data in data_list.get("results"):
        statistics += 1 
        r = requests.post(url, data=json.dumps(data), headers=headers,
                          auth=HTTPBasicAuth(username, password))

def load_from_directory(some_arguments, directory):
    pool = mp.Pool(mp.cpu_count() * 2)
    func = partial(load_from_file, some_arguments)
    file_list = [f for f in listdir(directory) if isfile(join(directory, f))]
    pool.map(func, [join(directory, f) for f in file_list ])
    pool.close() 
    pool.join() 

    print "total post requests", statistics

我想打印使用此脚本处理的 post 请求总数。方法对吗?

在使用多进程时共享内存并不是那么简单。我没有看到需要使用多处理模块而不是线程。多处理主要用作全局解释器锁的解决方法。

在您的示例中,您使用的 IO 绑定操作可能永远不会达到完整的 CPU 时间。如果您坚持使用多进程而不是线程,我建议您看一下 exchanging-objects-between-processes.

否则使用 threading 你可以在线程之间共享全局 statistics 变量。

import threading

statistics = 0

def load_from_file(some_arguments, filename):
    global statistics
    data_list = json.loads(open(filename).read())
    url = address + getUrl(filename, config)
    for data in data_list.get("results"):
        statistics += 1
        r = requests.post(url, data=json.dumps(data), headers=headers,
                        auth=HTTPBasicAuth(username, password))

def load_from_directory(some_arguments, directory):
    threads = []
    func = partial(load_from_file, some_arguments)

    file_list = [f for f in listdir(directory) if isfile(join(directory, f))]

    for f in file_list:
        t = threading.Thread(target=func, args=(join(directory, f)))
        t.start()
        threads.append(t)

    #Wait for threads to finish
    for thread in threads:
        thread.join()

    print "total post requests", statistics

注意:这当前会根据目录中的文件数同时生成线程。您可能希望实施某种节流以获得最佳性能。