为什么我不能抓取所有数据?
Why can't I scrape all data?
通过这个流程,我试图从特定网站抓取所有数据。主要问题与流程的输出有关,因为我没有收到所有主队的名单,而只收到第一场比赛的主队名称。我该怎么做才能从网站接收所有数据?
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.get('https://www.diretta.it')
html = driver.page_source
soup = BeautifulSoup(html,'lxml')
games = soup.find_all('div', class_ = 'event__match event__match--live event__match--last
event__match--twoLine')
for game in games:
home = soup.find('div', class_ = 'event__participant event__participant--home').text
away = soup.find('div', class_ = 'event__participant event__participant--away').text
time = soup.find('div', class_ = 'event__time').text
print(home)
您正在遍历游戏,但并未将其用作 in-loop 发现的对象。
home = game.find('div', class_ = 'event__participant event__participant--home').text
首先当使用selenium时你不需要beautiful soup,因为你可以使用find_elenet_by
找到一个标签和 find_elements_by
( 具有 s 的元素。复数 ),以获取具有相似实体的所有标签的列表。
您的代码将是:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.get('https://www.diretta.it')
games = driver.find_elements_by_css_selector('div[class = "event__match event__match--live event__match--last event__match--twoLine"]')
for game in games:
home = game.find_element_by_css_selector('div[class = "event__participant event__participant--home"]').text
away = game.find_element_by_css_selector('div[class = "event__participant event__participant--away"]').text
time = game.find_element_by_css_selector('div[class = "event__time"]').text
print(home)
通过这个流程,我试图从特定网站抓取所有数据。主要问题与流程的输出有关,因为我没有收到所有主队的名单,而只收到第一场比赛的主队名称。我该怎么做才能从网站接收所有数据?
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.get('https://www.diretta.it')
html = driver.page_source
soup = BeautifulSoup(html,'lxml')
games = soup.find_all('div', class_ = 'event__match event__match--live event__match--last
event__match--twoLine')
for game in games:
home = soup.find('div', class_ = 'event__participant event__participant--home').text
away = soup.find('div', class_ = 'event__participant event__participant--away').text
time = soup.find('div', class_ = 'event__time').text
print(home)
您正在遍历游戏,但并未将其用作 in-loop 发现的对象。
home = game.find('div', class_ = 'event__participant event__participant--home').text
首先当使用selenium时你不需要beautiful soup,因为你可以使用find_elenet_by
找到一个标签和 find_elements_by
( 具有 s 的元素。复数 ),以获取具有相似实体的所有标签的列表。
您的代码将是:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.get('https://www.diretta.it')
games = driver.find_elements_by_css_selector('div[class = "event__match event__match--live event__match--last event__match--twoLine"]')
for game in games:
home = game.find_element_by_css_selector('div[class = "event__participant event__participant--home"]').text
away = game.find_element_by_css_selector('div[class = "event__participant event__participant--away"]').text
time = game.find_element_by_css_selector('div[class = "event__time"]').text
print(home)