在网页中使用 Selenium 进行更改,驱动程序的响应 returns 相同的值(Scrapy,Python)
Making changes with Selenium in a web page and the response of the driver returns the same value (Scrapy, Python)
在具有“显示更多”按钮的网页中,我循环单击它直到它不再存在(我可以看到整个页面)。现在我需要获取一些数据,但我获取的数据与单击“显示更多”按钮之前的数据相同。
这是执行此操作的代码:
bodyBefore = response.xpath('/body').get()
# Click the Show More button till it isn't anymore
showmore_btn = self.driver.find_elements_by_xpath(
"//a[@class='event__more event__more--static']")
while len(showmore_btn) > 0:
showmore_btn[0].send_keys(Keys.ENTER)
# Add more time if the previous command doens`t work (Bad internet connection)
time.sleep(5)
showmore_btn = self.driver.find_elements_by_xpath(
"//a[@class='event__more event__more--static']")
bodyAfter = response.xpath('/body').get()
我无法获取新的 html 代码来抓取它。 (有了 bodyBefore 和 bodyAfter 我可以很容易地证明这一点)
有人知道怎么做吗?
我正在抓取的 url 是:
https://www.flashscore.com/football/england/premier-league-2018-2019/results/
在这种情况下,我想抓取点击“显示更多”
后出现的每个匹配项 url
首先您需要找到主要的 table,然后是包含数据行的所有 <div>
标签。
接下来,您可以遍历行中的元素以获取文本数据。我在循环中添加了进度字符串,希望你喜欢:)
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import time
import sys
driver = webdriver.Chrome(executable_path =os.path.abspath(os.getcwd()) + "/chromedriver")
driver.get("https://www.flashscore.com/football/england/premier-league-2018-2019/results/")
# extend table
show_more_buttons = driver.find_elements_by_xpath("//a[@class='event__more event__more--static']")
while len(show_more_buttons) > 0:
show_more_buttons[0].send_keys(Keys.ENTER)
time.sleep(2)
show_more_buttons = driver.find_elements_by_xpath("//a[@class='event__more event__more--static']")
# get table and events
table = driver.find_element_by_xpath('//*[@id="live-table"]/div[1]/div/div')
events = table.find_elements_by_class_name('event__match.event__match--static.event__match--oneLine')
# loop over events and collect data
count = 1
data = []
for item in events:
time = item.find_element_by_class_name('event__time').text
participant_home = item.find_element_by_class_name('event__participant.event__participant--home').text
event_scores = item.find_element_by_class_name('event__scores.fontBold').text
participant_away = item.find_element_by_class_name('event__participant.event__participant--away').text
event_part = item.find_element_by_class_name('event__part').text
data.append([time, participant_home, event_scores.replace('\n', ''), participant_away, event_part])
sys.stdout.write('\r')
sys.stdout.write("progress: %.2f %%" % ((count/len(events))*100))
sys.stdout.flush()
count += 1
for item in data:
print(item)
输出:
['12.05. 16:00', 'Brighton', '1 - 4', 'Manchester City', '(1 - 2)']
['12.05. 16:00', 'Burnley', '1 - 3', 'Arsenal', '(0 - 0)']
..
..
..
['11.08. 16:00', 'Watford', '2 - 0', 'Brighton', '(1 - 0)']
['11.08. 13:30', 'Newcastle', '1 - 2', 'Tottenham', '(1 - 2)']
['10.08. 21:00', 'Manchester Utd', '2 - 1', 'Leicester', '(1 - 0)']
在具有“显示更多”按钮的网页中,我循环单击它直到它不再存在(我可以看到整个页面)。现在我需要获取一些数据,但我获取的数据与单击“显示更多”按钮之前的数据相同。
这是执行此操作的代码:
bodyBefore = response.xpath('/body').get()
# Click the Show More button till it isn't anymore
showmore_btn = self.driver.find_elements_by_xpath(
"//a[@class='event__more event__more--static']")
while len(showmore_btn) > 0:
showmore_btn[0].send_keys(Keys.ENTER)
# Add more time if the previous command doens`t work (Bad internet connection)
time.sleep(5)
showmore_btn = self.driver.find_elements_by_xpath(
"//a[@class='event__more event__more--static']")
bodyAfter = response.xpath('/body').get()
我无法获取新的 html 代码来抓取它。 (有了 bodyBefore 和 bodyAfter 我可以很容易地证明这一点)
有人知道怎么做吗?
我正在抓取的 url 是: https://www.flashscore.com/football/england/premier-league-2018-2019/results/
在这种情况下,我想抓取点击“显示更多”
后出现的每个匹配项 url首先您需要找到主要的 table,然后是包含数据行的所有 <div>
标签。
接下来,您可以遍历行中的元素以获取文本数据。我在循环中添加了进度字符串,希望你喜欢:)
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import time
import sys
driver = webdriver.Chrome(executable_path =os.path.abspath(os.getcwd()) + "/chromedriver")
driver.get("https://www.flashscore.com/football/england/premier-league-2018-2019/results/")
# extend table
show_more_buttons = driver.find_elements_by_xpath("//a[@class='event__more event__more--static']")
while len(show_more_buttons) > 0:
show_more_buttons[0].send_keys(Keys.ENTER)
time.sleep(2)
show_more_buttons = driver.find_elements_by_xpath("//a[@class='event__more event__more--static']")
# get table and events
table = driver.find_element_by_xpath('//*[@id="live-table"]/div[1]/div/div')
events = table.find_elements_by_class_name('event__match.event__match--static.event__match--oneLine')
# loop over events and collect data
count = 1
data = []
for item in events:
time = item.find_element_by_class_name('event__time').text
participant_home = item.find_element_by_class_name('event__participant.event__participant--home').text
event_scores = item.find_element_by_class_name('event__scores.fontBold').text
participant_away = item.find_element_by_class_name('event__participant.event__participant--away').text
event_part = item.find_element_by_class_name('event__part').text
data.append([time, participant_home, event_scores.replace('\n', ''), participant_away, event_part])
sys.stdout.write('\r')
sys.stdout.write("progress: %.2f %%" % ((count/len(events))*100))
sys.stdout.flush()
count += 1
for item in data:
print(item)
输出:
['12.05. 16:00', 'Brighton', '1 - 4', 'Manchester City', '(1 - 2)']
['12.05. 16:00', 'Burnley', '1 - 3', 'Arsenal', '(0 - 0)']
..
..
..
['11.08. 16:00', 'Watford', '2 - 0', 'Brighton', '(1 - 0)']
['11.08. 13:30', 'Newcastle', '1 - 2', 'Tottenham', '(1 - 2)']
['10.08. 21:00', 'Manchester Utd', '2 - 1', 'Leicester', '(1 - 0)']