创建循环以按顺序打开链接
Creating a loop to open links in sequence
这个网站:
https://int.soccerway.com/international/europe/european-championships/c25/
EUROPE
European Championship
2020
Group Stage
Final Stages
EC Qualification
WC Qualification Europe
UEFA Nations League
Baltic Cup
在 Group Stage
和 Final Stages
的左侧菜单扩展中有两个 link:
https://int.soccerway.com/international/europe/european-championships/2020/group-stage/r38188/
https://int.soccerway.com/international/europe/european-championships/2020/s13030/final-stages/
我设法收集了 link,但是当我尝试一个接一个地打开页面时,它只停在第一个 link 而没有打开第二个一、我要改什么?
url = "https://int.soccerway.com/international/europe/european-championships/c25/"
driver.get(url)
links_level_2 = driver.find_elements_by_xpath("//ul[contains(@class,'level-2')]/li/a")
for link_level_2 in links_level_2:
level_2 = link_level_2.get_attribute("href")
driver.get(level_2)
您可以使用 beautifulsoup
轻松完成此操作。
由于您没有明确提到那些 links
是什么,我假设您正在尝试提取 ul
下的 links
查看我们的代码。
这是使用 beautifulsoup
的代码。 它从提到的两个链接中获取 <ul>
下的链接。
import bs4 as bs
import requests
urls = ['https://int.soccerway.com/international/europe/european-championships/2020/group-stage/r38188/', 'https://int.soccerway.com/international/europe/european-championships/2020/s13030/final-stages/']
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
#Code to get the ULs
for url in urls:
resp = requests.get(url, headers=headers)
soup = bs.BeautifulSoup(resp.text, 'lxml')
ls = soup.find('ul', class_='level-2').findAll('li')
for i in ls:
print(i.find('a')['href'])
print('\n')
/international/europe/european-championships/2020/group-stage/r38188/
/international/europe/european-championships/2020/group-stage/group-a/g10136/
/international/europe/european-championships/2020/group-stage/group-b/g10137/
/international/europe/european-championships/2020/group-stage/group-c/g10138/
/international/europe/european-championships/2020/group-stage/group-d/g10139/
/international/europe/european-championships/2020/group-stage/group-e/g10140/
/international/europe/european-championships/2020/group-stage/group-f/g10141/
/international/europe/european-championships/2020/s13030/final-stages/
/international/europe/european-championships/2020/group-stage/r38188/
/international/europe/european-championships/2020/s13030/final-stages/
这个网站:
https://int.soccerway.com/international/europe/european-championships/c25/
EUROPE
European Championship
2020
Group Stage
Final Stages
EC Qualification
WC Qualification Europe
UEFA Nations League
Baltic Cup
在 Group Stage
和 Final Stages
的左侧菜单扩展中有两个 link:
https://int.soccerway.com/international/europe/european-championships/2020/group-stage/r38188/
https://int.soccerway.com/international/europe/european-championships/2020/s13030/final-stages/
我设法收集了 link,但是当我尝试一个接一个地打开页面时,它只停在第一个 link 而没有打开第二个一、我要改什么?
url = "https://int.soccerway.com/international/europe/european-championships/c25/"
driver.get(url)
links_level_2 = driver.find_elements_by_xpath("//ul[contains(@class,'level-2')]/li/a")
for link_level_2 in links_level_2:
level_2 = link_level_2.get_attribute("href")
driver.get(level_2)
您可以使用 beautifulsoup
轻松完成此操作。
由于您没有明确提到那些 links
是什么,我假设您正在尝试提取 ul
下的 links
查看我们的代码。
这是使用 beautifulsoup
的代码。 它从提到的两个链接中获取 <ul>
下的链接。
import bs4 as bs
import requests
urls = ['https://int.soccerway.com/international/europe/european-championships/2020/group-stage/r38188/', 'https://int.soccerway.com/international/europe/european-championships/2020/s13030/final-stages/']
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
#Code to get the ULs
for url in urls:
resp = requests.get(url, headers=headers)
soup = bs.BeautifulSoup(resp.text, 'lxml')
ls = soup.find('ul', class_='level-2').findAll('li')
for i in ls:
print(i.find('a')['href'])
print('\n')
/international/europe/european-championships/2020/group-stage/r38188/
/international/europe/european-championships/2020/group-stage/group-a/g10136/
/international/europe/european-championships/2020/group-stage/group-b/g10137/
/international/europe/european-championships/2020/group-stage/group-c/g10138/
/international/europe/european-championships/2020/group-stage/group-d/g10139/
/international/europe/european-championships/2020/group-stage/group-e/g10140/
/international/europe/european-championships/2020/group-stage/group-f/g10141/
/international/europe/european-championships/2020/s13030/final-stages/
/international/europe/european-championships/2020/group-stage/r38188/
/international/europe/european-championships/2020/s13030/final-stages/