在 python 中用 selenium 抓取网站时单击两个连续的按钮
Clicking two consecutive buttons while scraping a website with selenium in python
我正在尝试从下面的网站上抓取国家/地区信息,
https://www.morningstar.com/etfs/xnas/vnqi/portfolio
这需要单击 Exposure
部分中的 'Country'
选择,然后使用该部分底部的箭头在第 1、2、3 等页面中移动。我尝试过的任何东西似乎都不起作用。有没有办法在 Python 中使用硒来做到这一点?
非常感谢!
这是我使用的代码:
urlpage = 'https://www.morningstar.com/etfs/xnas/vnqi/portfolio'
driver = webdriver.Chrome(options=options, executable_path='D:\Python\Python38\chromedriver.exe')
driver.get(urlpage)
elements=WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[text()='Country']")))
for elem in elements:
elem.click()
这是错误信息:
TimeoutException
Traceback (most recent call last)
<ipython-input-3-bf16ea3f65c0> in <module>
23 driver = webdriver.Chrome(options=options, executable_path='D:\Python\Python38\chromedriver.exe')
24 driver.get(urlpage)
---> 25 elements=WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[text()='Country']")))
26 for elem in elements:
27 elem.click()
D:\Anaconda\lib\site-packages\selenium\webdriver\support\wait.py in until(self, method, message)
78 if time.time() > end_time:
79 break
---> 80 raise TimeoutException(message, screen, stacktrace)
81
82 def until_not(self, method, message=''):
TimeoutException: Message:
抱歉,不确定如何更好地格式化错误消息。再次感谢。
看来你没有检查你在 HTML
中的真实情况。所以你没有做最重要的事情。
此页面上没有带有文本 Country
的 <a>
。
有<input>
和value="Country"
这段代码对我有用
import time
from selenium import webdriver
url = 'https://www.morningstar.com/etfs/xnas/vnqi/portfolio'
driver = webdriver.Chrome()
driver.get(url)
time.sleep(2)
country = driver.find_element_by_xpath('//input[@value="Country"]')
country.click()
time.sleep(1)
next_page = driver.find_element_by_xpath('//a[@aria-label="Go to Next Page"]')
while True:
# get data
table_rows = driver.find_elements_by_xpath('//table[@class="sal-country-exposure__country-table"]//tr')
for row in table_rows[1:]: # skip header
elements = row.find_elements_by_xpath('.//span') # relative xpath with `.//`
print(elements[0].text, elements[1].text, elements[2].text)
# check if there is next page
disabled = next_page.get_attribute('aria-disabled')
#print('disabled:', disabled)
if disabled:
break
# go to next page
next_page.click()
time.sleep(1)
结果
Japan 22.08 13.47
China 10.76 1.45
Australia 9.75 6.05
Hong Kong 9.52 6.04
Germany 8.84 5.77
Singapore 6.46 4.33
United Kingdom 6.22 5.77
Sweden 3.48 2.00
France 3.18 2.58
Canada 2.28 2.92
Switzerland 1.78 0.69
Belgium 1.63 1.31
Philippines 1.53 0.15
Israel 1.47 0.16
Thailand 0.98 0.09
India 0.87 0.11
South Africa 0.87 0.21
Taiwan 0.83 0.08
Mexico 0.80 0.33
Spain 0.62 0.84
Malaysia 0.54 0.08
Brazil 0.52 0.06
Austria 0.51 0.16
New Zealand 0.41 0.21
Indonesia 0.37 0.02
Norway 0.37 0.29
United States 0.29 44.09
Netherlands 0.24 0.19
Chile 0.21 0.01
Ireland 0.16 0.19
South Korea 0.15 0.00
Turkey 0.08 0.02
Russia 0.08 0.00
Finland 0.06 0.16
Poland 0.05 0.00
Greece 0.05 0.00
Italy 0.02 0.05
Argentina 0.00 0.00
Colombia 0.00 0.00
Czech Republic 0.00 0.00
Denmark 0.00 0.00
Estonia 0.00 0.00
Hungary 0.00 0.00
Latvia 0.00 0.00
Lithuania 0.00 0.00
Pakistan 0.00 0.00
Peru 0.00 0.00
Portugal 0.00 0.00
Slovakia 0.00 0.00
Venezuela 0.00 0.00
我正在尝试从下面的网站上抓取国家/地区信息,
https://www.morningstar.com/etfs/xnas/vnqi/portfolio
这需要单击 Exposure
部分中的 'Country'
选择,然后使用该部分底部的箭头在第 1、2、3 等页面中移动。我尝试过的任何东西似乎都不起作用。有没有办法在 Python 中使用硒来做到这一点?
非常感谢!
这是我使用的代码:
urlpage = 'https://www.morningstar.com/etfs/xnas/vnqi/portfolio'
driver = webdriver.Chrome(options=options, executable_path='D:\Python\Python38\chromedriver.exe')
driver.get(urlpage)
elements=WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[text()='Country']")))
for elem in elements:
elem.click()
这是错误信息:
TimeoutException
Traceback (most recent call last)
<ipython-input-3-bf16ea3f65c0> in <module>
23 driver = webdriver.Chrome(options=options, executable_path='D:\Python\Python38\chromedriver.exe')
24 driver.get(urlpage)
---> 25 elements=WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[text()='Country']")))
26 for elem in elements:
27 elem.click()
D:\Anaconda\lib\site-packages\selenium\webdriver\support\wait.py in until(self, method, message)
78 if time.time() > end_time:
79 break
---> 80 raise TimeoutException(message, screen, stacktrace)
81
82 def until_not(self, method, message=''):
TimeoutException: Message:
抱歉,不确定如何更好地格式化错误消息。再次感谢。
看来你没有检查你在 HTML
中的真实情况。所以你没有做最重要的事情。
此页面上没有带有文本 Country
的 <a>
。
有<input>
和value="Country"
这段代码对我有用
import time
from selenium import webdriver
url = 'https://www.morningstar.com/etfs/xnas/vnqi/portfolio'
driver = webdriver.Chrome()
driver.get(url)
time.sleep(2)
country = driver.find_element_by_xpath('//input[@value="Country"]')
country.click()
time.sleep(1)
next_page = driver.find_element_by_xpath('//a[@aria-label="Go to Next Page"]')
while True:
# get data
table_rows = driver.find_elements_by_xpath('//table[@class="sal-country-exposure__country-table"]//tr')
for row in table_rows[1:]: # skip header
elements = row.find_elements_by_xpath('.//span') # relative xpath with `.//`
print(elements[0].text, elements[1].text, elements[2].text)
# check if there is next page
disabled = next_page.get_attribute('aria-disabled')
#print('disabled:', disabled)
if disabled:
break
# go to next page
next_page.click()
time.sleep(1)
结果
Japan 22.08 13.47
China 10.76 1.45
Australia 9.75 6.05
Hong Kong 9.52 6.04
Germany 8.84 5.77
Singapore 6.46 4.33
United Kingdom 6.22 5.77
Sweden 3.48 2.00
France 3.18 2.58
Canada 2.28 2.92
Switzerland 1.78 0.69
Belgium 1.63 1.31
Philippines 1.53 0.15
Israel 1.47 0.16
Thailand 0.98 0.09
India 0.87 0.11
South Africa 0.87 0.21
Taiwan 0.83 0.08
Mexico 0.80 0.33
Spain 0.62 0.84
Malaysia 0.54 0.08
Brazil 0.52 0.06
Austria 0.51 0.16
New Zealand 0.41 0.21
Indonesia 0.37 0.02
Norway 0.37 0.29
United States 0.29 44.09
Netherlands 0.24 0.19
Chile 0.21 0.01
Ireland 0.16 0.19
South Korea 0.15 0.00
Turkey 0.08 0.02
Russia 0.08 0.00
Finland 0.06 0.16
Poland 0.05 0.00
Greece 0.05 0.00
Italy 0.02 0.05
Argentina 0.00 0.00
Colombia 0.00 0.00
Czech Republic 0.00 0.00
Denmark 0.00 0.00
Estonia 0.00 0.00
Hungary 0.00 0.00
Latvia 0.00 0.00
Lithuania 0.00 0.00
Pakistan 0.00 0.00
Peru 0.00 0.00
Portugal 0.00 0.00
Slovakia 0.00 0.00
Venezuela 0.00 0.00