manager.dict() "skipping" 更新多处理中的一些值 ~ Python
manager.dict() "skipping" to update some values in multiprocessing ~ Python
在多处理中,我想更新 manager.dict()
,它正在更新...但是在更新时有些数据被跳过了?可以做什么?
有点类似这个...
from multiprocessing import Process, Manager
manager = Manager()
a = manager.dict()
a['url_info'] = manager.list()
def parse_link(link):
# parse link, pared_info returns dict
pared_info = link_parser(link)
a['url_info'].append(pared_info)
# Links contains a lot of url that needs to be parsed.
links = ["https://url.com/1","https://url.com/2", "https://url.com/3"]
processes = []
for link in links:
p = Process(target=parse_link, args=link,))
p.start()
processes.append(p)
for process in processes:
process.join()
link_parser()
是一个函数,returns一个字典,里面包含了scraped/parsed网页的信息。
> print(list(a['url_info']))
> ['#info_1', '#info_3']
此处多处理程序跳过更新列表(又名数组)中的 #info_2
。请帮助我
下面是一些代码,演示了针对您要执行的操作的改进结构。
显然它没有您的 link_parser() 的详细信息,但您会明白这一点的。
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager
from functools import partial
LINKS = ['abc', 'def', 'ghi']
KEY = 'url_info'
def parse_link(a, link):
a[KEY].append(link)
def main():
with Manager() as manager:
a = manager.dict()
a[KEY] = manager.list()
with ProcessPoolExecutor() as executor:
executor.map(partial(parse_link, a), LINKS)
print(a[KEY])
if __name__ == '__main__':
main()
输出:
['abc', 'def', 'ghi']
在多处理中,我想更新 manager.dict()
,它正在更新...但是在更新时有些数据被跳过了?可以做什么?
有点类似这个...
from multiprocessing import Process, Manager
manager = Manager()
a = manager.dict()
a['url_info'] = manager.list()
def parse_link(link):
# parse link, pared_info returns dict
pared_info = link_parser(link)
a['url_info'].append(pared_info)
# Links contains a lot of url that needs to be parsed.
links = ["https://url.com/1","https://url.com/2", "https://url.com/3"]
processes = []
for link in links:
p = Process(target=parse_link, args=link,))
p.start()
processes.append(p)
for process in processes:
process.join()
link_parser()
是一个函数,returns一个字典,里面包含了scraped/parsed网页的信息。
> print(list(a['url_info']))
> ['#info_1', '#info_3']
此处多处理程序跳过更新列表(又名数组)中的 #info_2
。请帮助我
下面是一些代码,演示了针对您要执行的操作的改进结构。
显然它没有您的 link_parser() 的详细信息,但您会明白这一点的。
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager
from functools import partial
LINKS = ['abc', 'def', 'ghi']
KEY = 'url_info'
def parse_link(a, link):
a[KEY].append(link)
def main():
with Manager() as manager:
a = manager.dict()
a[KEY] = manager.list()
with ProcessPoolExecutor() as executor:
executor.map(partial(parse_link, a), LINKS)
print(a[KEY])
if __name__ == '__main__':
main()
输出:
['abc', 'def', 'ghi']