Python 具有共享变量(值)的多处理 Pool.apply_async

Python multiprocessing Pool.apply_async with shared variables (Value)

对于我的大学项目,我正在尝试开发一个基于 python 的流量 generator.I 已经在 vmware 上创建了 2 台 CentOS 机器,我使用 1 台作为我的客户端,1 台作为我的服务器机器。我已经使用了 IP aliasing technique to increase number of clients and severs using just single client/server machine. Upto now I have created 50 IP alias on my client machine and 10 IP alias on my server machine. I am also using multiprocessing module to generate traffic concurrently from all 50 clients to all 10 servers. I have also developed few profiles(1kb,10kb,50kb,100kb,500kb,1mb) on my server(in /var/www/html directory since I am using Apache Server) and I am using urllib2 to send request to these profiles from my client machine. I am using to first bind to any one of the source alias ip and then send request from this ip using urllib2. Here to ,我正在尝试使用 multiprocessing.Pool.apply_async 模块。但是我在 运行 我的脚本时收到此错误 'RuntimeError: Synchronized objects should only be shared between processes through inheritance'。经过一番调试,我发现这个错误是由于使用multiprocessing.Value引起的。但我想在我的进程之间共享一些变量,我也想增加我的 TCP 连接数。还有什么其他模块(除了multiprocessing.Value)可以在这里使用来共享一些公共变量?或者这个查询还有其他解决方案吗?

'''
Traffic Generator Script:

 Here I have used IP Aliasing to create multiple clients on single vm machine. 
 Same I have done on server side to create multiple servers. I have around 50 clients and 10 servers
'''
import multiprocessing
import urllib2
import random
import myurllist    #list of all destination urls for all 10 servers
import time
import socbindtry   #script that binds various virtual/aliased client ips to the script
m=multiprocessing.Manager()
response_time=m.list()    #some shared variables
error_count=multiprocessing.Value('i',0)
def send_request3():    #function to send requests from alias client ip 1
    opener=urllib2.build_opener(socbindtry.BindableHTTPHandler3)    #bind to alias client ip1
    try:
        tstart=time.time()
        for i in range(myurllist.url):
            x=random.choice(myurllist.url[i])
            opener.open(x).read()
            print "file downloaded:",x
            response_time.append(time.time()-tstart)
    except urllib2.URLError, e:
        error_count.value=error_count.value+1
def send_request4():    #function to send requests from alias client ip 2
    opener=urllib2.build_opener(socbindtry.BindableHTTPHandler4)    #bind to alias client ip2
    try:
        tstart=time.time()
        for i in range(myurllist.url):
            x=random.choice(myurllist.url[i])
            opener.open(x).read()
            print "file downloaded:",x
            response_time.append(time.time()-tstart)
    except urllib2.URLError, e:
        error_count.value=error_count.value+1
#50 such functions are defined here for 50 clients
def func():
    pool=multiprocessing.Pool(processes=750)
    for i in range(5):
        pool.apply_async(send_request3)
        pool.apply_async(send_request4)
        pool.apply_async(send_request5)
#append 50 functions here
    pool.close()
    pool.join()
    print"All work Done..!!"
    return
start=float(time.time())
func()
end=float(time.time())-start
print end

可能是因为 Python Multiprocess diff between Windows and Linux (我真的不知道多处理在 VM 中是如何工作的,这里就是这种情况。)

这可能有效;

import multiprocessing
import random
import myurllist    #list of all destination urls for all 10 servers
import time

def send_request3(response_time, error_count):    #function to send requests from alias client ip 1
    opener=urllib2.build_opener(socbindtry.BindableHTTPHandler3)    #bind to alias client ip1
    try:
        tstart=time.time()
        for i in range(myurllist.url):
            x=random.choice(myurllist.url[i])
            opener.open(x).read()
            print "file downloaded:",x
            response_time.append(time.time()-tstart)
    except urllib2.URLError, e:
        error_count.value=error_count.value+1
def send_request4(response_time, error_count):    #function to send requests from alias client ip 2
    opener=urllib2.build_opener(socbindtry.BindableHTTPHandler4)    #bind to alias client ip2
    try:
        tstart=time.time()
        for i in range(myurllist.url):
            x=random.choice(myurllist.url[i])
            opener.open(x).read()
            print "file downloaded:",x
            response_time.append(time.time()-tstart)
    except urllib2.URLError, e:
        error_count.value=error_count.value+1
#50 such functions are defined here for 50 clients
def func():
    m=multiprocessing.Manager()
    response_time=m.list()    #some shared variables
    error_count=multiprocessing.Value('i',0)

    pool=multiprocessing.Pool(processes=750)
    for i in range(5):
        pool.apply_async(send_request3, [response_time, error_count])
        pool.apply_async(send_request4, [response_time, error_count])
        # pool.apply_async(send_request5)
#append 50 functions here
    pool.close()
    pool.join()
    print"All work Done..!!"
    return


start=float(time.time())
func()
end=float(time.time())-start
print end

如错误消息所述,您无法通过 pickle 传递 multiprocessing.Value。但是,您可以使用 multiprocessing.Manager().Value:

import multiprocessing
import urllib2
import random
import myurllist    #list of all destination urls for all 10 servers
import time
import socbindtry   #script that binds various virtual/aliased client ips to the script

def send_request3(response_time, error_count):    #function to send requests from alias client ip 1
    opener=urllib2.build_opener(socbindtry.BindableHTTPHandler3)    #bind to alias client ip1
    try:
        tstart=time.time()
        for i in range(myurllist.url):
            x=random.choice(myurllist.url[i])
            opener.open(x).read()
            print "file downloaded:",x
            response_time.append(time.time()-tstart)
    except urllib2.URLError, e:
        with error_count.get_lock():
            error_count.value += 1

def send_request4(response_time, error_count):    #function to send requests from alias client ip 2
    opener=urllib2.build_opener(socbindtry.BindableHTTPHandler4)    #bind to alias client ip2
    try:
        tstart=time.time()
        for i in range(myurllist.url):
            x=random.choice(myurllist.url[i])
            opener.open(x).read()
            print "file downloaded:",x
            response_time.append(time.time()-tstart)
    except urllib2.URLError, e:
        with error_count.get_lock():
            error_count.value += 1

#50 such functions are defined here for 50 clients

def func(response_time, error_count):
    pool=multiprocessing.Pool(processes=2*multiprocessing.cpu_count())
    args = (response_time, error_count)
    for i in range(5):
        pool.apply_async(send_request3, args=args)
        pool.apply_async(send_request4, args=args)
#append 50 functions here
    pool.close()
    pool.join()
    print"All work Done..!!"
    return

if __name__ == "__main__":
    m=multiprocessing.Manager()
    response_time=m.list()    #some shared variables
    error_count=m.Value('i',0)

    start=float(time.time())
    func(response_time, error_count)
    end=float(time.time())-start
    print end

这里还有一些其他注意事项:

  1. 对 750 个进程使用 Pool 不是一个好主意。除非您使用的服务器有数百个 CPU 核心,否则这会使您的机器不堪重负。使用更少的进程会更快,并且对您的机器造成更少的压力。更像是 2 * multiprocessing.cpu_count().
  2. 作为最佳实践,您应该将需要使用的所有共享参数显式传递给子进程,而不是使用全局变量。这增加了代码在 Windows.
  3. 上运行的可能性
  4. 看起来您的所有 send_request* 函数都做几乎完全相同的事情。为什么不只创建一个函数并使用一个变量来决定使用哪个 socbindtry.BindableHTTPHandler 呢?这样做可以避免 ton 的代码重复。
  5. 您增加 error_count 的方式不是 process/thread-safe,并且容易受到竞争条件的影响。您需要用锁来保护增量(就像我在上面的示例代码中所做的那样)。