以某种自定义方式使用 concurrent.futures 时无法打印函数的结果
Unable to print results from a function while using concurrent.futures in some customized way
我使用 concurrent.futures
库创建了一个脚本来打印 fetch_links
函数的结果。当我在函数内使用 print
语句时,我得到了相应的结果。我现在想做的是使用 yield 语句打印该函数的结果。
有什么方法可以修改 main
函数下的内容,以便打印 fetch_links
函数的结果,保持原样,即保持 yield 语句?
import requests
from bs4 import BeautifulSoup
import concurrent.futures as cf
links = [
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50",
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50",
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=4&pagesize=50"
]
base = 'https://whosebug.com{}'
def fetch_links(s,link):
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select(".summary .question-hyperlink"):
# print(base.format(item.get("href")))
yield base.format(item.get("href"))
if __name__ == '__main__':
with requests.Session() as s:
with cf.ThreadPoolExecutor(max_workers=5) as exe:
future_to_url = {exe.submit(fetch_links,s,url): url for url in links}
cf.as_completed(future_to_url)
您的 fetch_links
是一个生成器,因此您也必须对其进行循环以获得结果:
import requests
from bs4 import BeautifulSoup
import concurrent.futures as cf
links = [
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50",
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50",
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=4&pagesize=50"
]
base = 'https://whosebug.com{}'
def fetch_links(s, link):
r = s.get(link)
soup = BeautifulSoup(r.text, "lxml")
for item in soup.select(".summary .question-hyperlink"):
yield base.format(item.get("href"))
if __name__ == '__main__':
with requests.Session() as s:
with cf.ThreadPoolExecutor(max_workers=5) as exe:
future_to_url = {exe.submit(fetch_links, s, url): url for url in links}
for future in cf.as_completed(future_to_url):
for result in future.result():
print(result)
输出:
and so on ...
我使用 concurrent.futures
库创建了一个脚本来打印 fetch_links
函数的结果。当我在函数内使用 print
语句时,我得到了相应的结果。我现在想做的是使用 yield 语句打印该函数的结果。
有什么方法可以修改 main
函数下的内容,以便打印 fetch_links
函数的结果,保持原样,即保持 yield 语句?
import requests
from bs4 import BeautifulSoup
import concurrent.futures as cf
links = [
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50",
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50",
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=4&pagesize=50"
]
base = 'https://whosebug.com{}'
def fetch_links(s,link):
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select(".summary .question-hyperlink"):
# print(base.format(item.get("href")))
yield base.format(item.get("href"))
if __name__ == '__main__':
with requests.Session() as s:
with cf.ThreadPoolExecutor(max_workers=5) as exe:
future_to_url = {exe.submit(fetch_links,s,url): url for url in links}
cf.as_completed(future_to_url)
您的 fetch_links
是一个生成器,因此您也必须对其进行循环以获得结果:
import requests
from bs4 import BeautifulSoup
import concurrent.futures as cf
links = [
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50",
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50",
"https://whosebug.com/questions/tagged/web-scraping?tab=newest&page=4&pagesize=50"
]
base = 'https://whosebug.com{}'
def fetch_links(s, link):
r = s.get(link)
soup = BeautifulSoup(r.text, "lxml")
for item in soup.select(".summary .question-hyperlink"):
yield base.format(item.get("href"))
if __name__ == '__main__':
with requests.Session() as s:
with cf.ThreadPoolExecutor(max_workers=5) as exe:
future_to_url = {exe.submit(fetch_links, s, url): url for url in links}
for future in cf.as_completed(future_to_url):
for result in future.result():
print(result)
输出:
and so on ...