多处理范围：列表未使用 'multiprocessing.Process' 更新，使用 'threading.Thread' 工作

Question

我在多处理中遇到一种情况，我用来从我的函数收集结果的列表没有被进程更新。我有两个代码示例，一个更新列表 更正：代码使用 'Thread' 正确更新，但在使用 'Process' 时失败，另一个不更新.我无法检测到任何类型的错误。我认为这可能是我不理解的范围的微妙之处。

这是工作示例：更正：该示例也不起作用；然而，适用于 threading.Thread。

def run_knn_result_wrapper(dataset,k_value,metric,results_list,index):
    results_list[index] = knn_result(dataset,k_value,metric)

results = [None] * (k_upper-k_lower)
threads = [None] * (k_upper-k_lower)
joined = [0] * (k_upper-k_lower)

for i in range(len(threads)):
    threads[i] = Process(target=run_knn_result_wrapper,args=(dataset,k_lower+i,metric,results,i))
    threads[i].start()
    if batch_size == 1:
        threads[i].join()
        joined[i]=1
    else:

        if i % batch_size == batch_size-1 and i > 0:
            for j in range(max(0,i - 2),i):
                if joined[j] == 0:
                    threads[j].join()
                    joined[j] = 1
for i in range(len(threads)):
    if joined[i] == 0:
        threads[i].join()


Ignoring the "threads" variable name (this started on threading, but then I found out about the GIL), the `results` list updates perfectly.

这是不更新结果列表的代码：

def prediction_on_batch_wrapper(batchX,results_list,index):
        results_list[index] = prediction_on_batch(batchX)



batches_of_X = np.array_split(X,10)

overall_predicted_classes_list = []
for i in range(len(batches_of_X)):
    batches_of_X_subsets = np.array_split(batches_of_X[i],10)
    processes = [None]*len(batches_of_X_subsets)
    results_list = [None]*len(batches_of_X_subsets)
    for j in range(len(batches_of_X_subsets)):
        processes[j] = Process(target=prediction_on_batch_wrapper,args=(batches_of_X_subsets[j],results_list,j))
    for j in processes:
        j.start()
    for j in processes:
        j.join()
    if len(results_list) > 1:
        results_array = np.concatenate(tuple(results_list))
    else:
        results_array = results_list[0]

我不知道为什么，在 Python 的范围规则中，results_list 列表没有被 prediction_on_batch_wrapper 函数更新。

调试会话显示 prediction_on_batch_wrapper 函数中的 results_list 值实际上得到了更新...但不知何故，它的范围在这一秒内是本地的 python文件，并在第一个全局...

这是怎么回事？

Answer 1

这是因为您正在生成另一个进程 - 单独的进程不共享任何资源，其中包括内存。

每个进程都是一个独立的运行程序，通常在任务管理器或 ps 中可见。当您使用 Process 启动一个附加进程时，您应该会看到 Python 的第二个实例在您生成该进程时启动。

线程是主进程中的另一个执行点，甚至跨多个内核共享主进程的所有资源。一个进程中的所有线程都能够看到整个进程的任何部分，尽管它们可以使用多少取决于您为该线程编写的代码以及您编写它们所使用的语言的限制。

使用 Process 就像运行程序的两个实例；您可以将参数传递给新流程，但这些是一旦传递就不再共享的副本。例如，如果您在主进程中修改了数据，新进程将看不到更改，因为这两个进程具有完全独立的数据副本。

如果你想共享数据，你真的应该使用线程而不是进程。对于大多数多处理需求，线程比进程更可取，除非在少数情况下需要严格分离。

多处理范围：列表未使用 'multiprocessing.Process' 更新，使用 'threading.Thread' 工作

Multiprocessing scope: list not updating using 'multiprocessing.Process', worked using 'threading.Thread'

python

scope

python-multiprocessing