return 多处理/映射函数中的计数器对象
return counter object in multiprocessing / map function
我有一个 python 脚本 运行,它在多个线程中启动相同的函数。这些函数创建并处理 2 个计数器(c1 和 c2)。来自分叉进程的所有 c1 计数器的结果应该合并在一起。与所有 c2 计数器的结果相同,由不同的分叉返回。
我的(伪)代码看起来像这样:
def countIt(cfg)
c1 = Counter
c2 = Counter
#do some things and fill the counters by counting words in an text, like
#c1= Counter({'apple': 3, 'banana': 0})
#c2= Counter({'blue': 3, 'green': 0})
return c1, c2
if __name__ == '__main__':
cP1 = Counter()
cP2 = Counter()
cfg = "myConfig"
p = multiprocessing.Pool(4) #creating 4 forks
c1, c2 = p.map(countIt,cfg)[:2]
# 1.) This will only work with [:2] which seams to be no good idea
# 2.) at this point c1 and c2 are lists, not a counter anymore,
# so the following will not work:
cP1 + c1
cP2 + c2
按照上面的例子,我需要这样的结果:
cP1 = 计数器({'apple': 25, 'banana': 247, 'orange': 24})
cP2 = 计数器({'red': 11, 'blue': 56, 'green': 3})
所以我的问题是:我如何才能对分叉进程的事物进行计数,以便聚合父进程中的每个计数器(所有 c1 和所有 c2)?
您需要 "unzip" 您的结果,例如使用 for-each 循环。您将收到一个元组列表,其中每个元组都是一对计数器:(c1, c2)
.
使用您当前的解决方案,您实际上将它们混合在一起。您将 [(c1a, c2a), (c1b, c2b)]
分配给 c1, c2
,这意味着 c1
包含 (c1a, c2a)
,c2
包含 (c1b, c2b)
。
试试这个:
if __name__ == '__main__':
from contextlib import closing
cP1 = Counter()
cP2 = Counter()
# I hope you have an actual list of configs here, otherwise map will
# will call `countIt` with the single characters of the string 'myConfig'
cfg = "myConfig"
# `contextlib.closing` makes sure the pool is closed after we're done.
# In python3, Pool is itself a contextmanager and you don't need to
# surround it with `closing` in order to be able to use it in the `with`
# construct.
# This approach, however, is compatible with both python2 and python3.
with closing(multiprocessing.Pool(4)) as p:
# Just counting, no need to order the results.
# This might actually be a bit faster.
for c1, c2 in p.imap_unordered(countIt, cfg):
cP1 += c1
cP2 += c2
我有一个 python 脚本 运行,它在多个线程中启动相同的函数。这些函数创建并处理 2 个计数器(c1 和 c2)。来自分叉进程的所有 c1 计数器的结果应该合并在一起。与所有 c2 计数器的结果相同,由不同的分叉返回。
我的(伪)代码看起来像这样:
def countIt(cfg)
c1 = Counter
c2 = Counter
#do some things and fill the counters by counting words in an text, like
#c1= Counter({'apple': 3, 'banana': 0})
#c2= Counter({'blue': 3, 'green': 0})
return c1, c2
if __name__ == '__main__':
cP1 = Counter()
cP2 = Counter()
cfg = "myConfig"
p = multiprocessing.Pool(4) #creating 4 forks
c1, c2 = p.map(countIt,cfg)[:2]
# 1.) This will only work with [:2] which seams to be no good idea
# 2.) at this point c1 and c2 are lists, not a counter anymore,
# so the following will not work:
cP1 + c1
cP2 + c2
按照上面的例子,我需要这样的结果: cP1 = 计数器({'apple': 25, 'banana': 247, 'orange': 24}) cP2 = 计数器({'red': 11, 'blue': 56, 'green': 3})
所以我的问题是:我如何才能对分叉进程的事物进行计数,以便聚合父进程中的每个计数器(所有 c1 和所有 c2)?
您需要 "unzip" 您的结果,例如使用 for-each 循环。您将收到一个元组列表,其中每个元组都是一对计数器:(c1, c2)
.
使用您当前的解决方案,您实际上将它们混合在一起。您将 [(c1a, c2a), (c1b, c2b)]
分配给 c1, c2
,这意味着 c1
包含 (c1a, c2a)
,c2
包含 (c1b, c2b)
。
试试这个:
if __name__ == '__main__':
from contextlib import closing
cP1 = Counter()
cP2 = Counter()
# I hope you have an actual list of configs here, otherwise map will
# will call `countIt` with the single characters of the string 'myConfig'
cfg = "myConfig"
# `contextlib.closing` makes sure the pool is closed after we're done.
# In python3, Pool is itself a contextmanager and you don't need to
# surround it with `closing` in order to be able to use it in the `with`
# construct.
# This approach, however, is compatible with both python2 and python3.
with closing(multiprocessing.Pool(4)) as p:
# Just counting, no need to order the results.
# This might actually be a bit faster.
for c1, c2 in p.imap_unordered(countIt, cfg):
cP1 += c1
cP2 += c2