在滚动菜单中选择全部后进行抓取

Question

我正在尝试从 link 中的 table 检索信息：https://ski-resort-stats.com/ski-resorts-in-europe/

该页面有一个滚动菜单，我必须首先对其进行操作，以便在该页面上显示所有条目并能够 select 打开它们。但是，当我检索我要查找的信息时，它并没有为整个 table 执行...我试图在两个操作之间添加一个睡眠时间，以防它 link对此没有任何改变。有人可以帮我吗？下面是我的代码：

driver = webdriver.Chrome("path/chromedriver")
driver.get("https://ski-resort-stats.com/ski-resorts-in-europe/")

content = driver.page_source
soup = BeautifulSoup(content)

#Select "All" in the drop down menu to select all the ski resorts
menu=driver.find_element_by_id("table_1_length")
for option in menu.find_elements_by_tag_name('option'):
    if option.text == 'All':
        option.click()
        break

import time 
time.sleep(10)

mydivs = soup.find_all("td",{"class":"column-resort-name"})
print(mydivs)

所以 mydivs 打印的最后一个元素不是 table...

的最后一个元素

Answer 1

所有数据已经在<table>的页面中：

import requests
from bs4 import BeautifulSoup

url = "https://ski-resort-stats.com/ski-resorts-in-europe/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

# print some data from rows
for row in soup.select("#table_1 tbody tr"):
    r = [td.get_text(strip=True) for td in row.select("td")]
    print(r[1])

打印：

Hemsedal
Geilosiden Geilo
Golm
Hafjell
Voss
Hochschwarzeck
Rossfeld - Berchtesgaden - Oberau

...

Puigmal
Kranzberg-Mittenwald
Wetterstein lifts-Wettersteinbahnen-– Ehrwald
Stuhleck-Spital am Semmering

在滚动菜单中选择全部后进行抓取

Scraping after selecting all in a scrolling menu

python

webdriver

beautifulsoup

web-scraping