多处理 BeautifulSoup bs4.element.Tag
Multiprocessing BeautifulSoup bs4.element.Tag
我正在尝试将多处理与 BeautifulSoup 一起使用,但遇到 maximum recursion depth exceeded
错误:
def process_card(card):
result = card.find("p")
# Do some more parsing with beautifulsoup
return results
pool = multiprocessing.Pool(processes=4)
soup = BeautifulSoup(url, 'html.parser')
cards = soup.findAll("li")
for card in cards:
result = pool.apply_async(process_card, [card])
article = result.get()
if article is not None:
print article
articles.append(article)
pool.close()
pool.join()
据我所知,card
属于 <class bs4.element.Tag>
类型,问题可能与酸洗此对象有关。目前尚不清楚我必须如何修改我的代码才能解决此问题。
评论中指出,可以简单地将 card
转换为 unicode。但是,这会导致 process_card
函数出错 slice indices must be integers or None or have an __index__ method
。事实证明,此错误与 card
不再是 bs4 对象因此无法访问 bs4 函数这一事实有关。相反,card
只是 unicode,错误是与 unicode 相关的错误。所以需要先把 card
变成汤,然后再从那里继续。这有效!
def process_card(unicode_card):
card = BeautifulSoup(unicode_card)
result = card.find("p")
# Do some more parsing with beautifulsoup
return results
pool = multiprocessing.Pool(processes=4)
soup = BeautifulSoup(url, 'html.parser')
cards = soup.findAll("li")
for card in cards:
result = pool.apply_async(process_card, [unicode(card)])
article = result.get()
if article is not None:
print article
articles.append(article)
pool.close()
pool.join()
我正在尝试将多处理与 BeautifulSoup 一起使用,但遇到 maximum recursion depth exceeded
错误:
def process_card(card):
result = card.find("p")
# Do some more parsing with beautifulsoup
return results
pool = multiprocessing.Pool(processes=4)
soup = BeautifulSoup(url, 'html.parser')
cards = soup.findAll("li")
for card in cards:
result = pool.apply_async(process_card, [card])
article = result.get()
if article is not None:
print article
articles.append(article)
pool.close()
pool.join()
据我所知,card
属于 <class bs4.element.Tag>
类型,问题可能与酸洗此对象有关。目前尚不清楚我必须如何修改我的代码才能解决此问题。
评论中指出,可以简单地将 card
转换为 unicode。但是,这会导致 process_card
函数出错 slice indices must be integers or None or have an __index__ method
。事实证明,此错误与 card
不再是 bs4 对象因此无法访问 bs4 函数这一事实有关。相反,card
只是 unicode,错误是与 unicode 相关的错误。所以需要先把 card
变成汤,然后再从那里继续。这有效!
def process_card(unicode_card):
card = BeautifulSoup(unicode_card)
result = card.find("p")
# Do some more parsing with beautifulsoup
return results
pool = multiprocessing.Pool(processes=4)
soup = BeautifulSoup(url, 'html.parser')
cards = soup.findAll("li")
for card in cards:
result = pool.apply_async(process_card, [unicode(card)])
article = result.get()
if article is not None:
print article
articles.append(article)
pool.close()
pool.join()