在 python 中用 selenium 抓取网站时单击两个连续的按钮

Question

我正在尝试从下面的网站上抓取国家/地区信息， https://www.morningstar.com/etfs/xnas/vnqi/portfolio 这需要单击 Exposure 部分中的 'Country' 选择，然后使用该部分底部的箭头在第 1、2、3 等页面中移动。我尝试过的任何东西似乎都不起作用。有没有办法在 Python 中使用硒来做到这一点？

非常感谢！

这是我使用的代码：

    urlpage   = 'https://www.morningstar.com/etfs/xnas/vnqi/portfolio'
    driver = webdriver.Chrome(options=options, executable_path='D:\Python\Python38\chromedriver.exe')
    driver.get(urlpage)
    elements=WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[text()='Country']")))
    for elem in elements:
        elem.click()

这是错误信息：

TimeoutException                          

Traceback (most recent call last)  
<ipython-input-3-bf16ea3f65c0> in <module>  
    23 driver = webdriver.Chrome(options=options, executable_path='D:\Python\Python38\chromedriver.exe')  
     24 driver.get(urlpage)  
---> 25 elements=WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[text()='Country']")))  
     26 for elem in elements:  
     27      elem.click()  
D:\Anaconda\lib\site-packages\selenium\webdriver\support\wait.py in until(self, method, message)  
     78             if time.time() > end_time:  
     79                 break  
---> 80         raise TimeoutException(message, screen, stacktrace)  
     81   
     82     def until_not(self, method, message=''):  
TimeoutException: Message:

抱歉，不确定如何更好地格式化错误消息。再次感谢。

Answer 1

看来你没有检查你在 HTML 中的真实情况。所以你没有做最重要的事情。

此页面上没有带有文本 Country 的 <a>。

有<input>和value="Country"

这段代码对我有用

import time
from selenium import webdriver

url = 'https://www.morningstar.com/etfs/xnas/vnqi/portfolio'

driver = webdriver.Chrome()
driver.get(url)

time.sleep(2)

country = driver.find_element_by_xpath('//input[@value="Country"]')
country.click()

time.sleep(1)
next_page = driver.find_element_by_xpath('//a[@aria-label="Go to Next Page"]')
    
while True:
    
    # get data
    table_rows = driver.find_elements_by_xpath('//table[@class="sal-country-exposure__country-table"]//tr')
    for row in table_rows[1:]:  # skip header 
        elements = row.find_elements_by_xpath('.//span')  # relative xpath with `.//`
        print(elements[0].text, elements[1].text, elements[2].text)

    # check if there is next page
    disabled = next_page.get_attribute('aria-disabled')
    #print('disabled:', disabled)
    if disabled:
        break

    # go to next page        
    next_page.click()
    
    time.sleep(1)

结果

Japan 22.08 13.47
China 10.76 1.45
Australia 9.75 6.05
Hong Kong 9.52 6.04
Germany 8.84 5.77
Singapore 6.46 4.33
United Kingdom 6.22 5.77
Sweden 3.48 2.00
France 3.18 2.58
Canada 2.28 2.92
Switzerland 1.78 0.69
Belgium 1.63 1.31
Philippines 1.53 0.15
Israel 1.47 0.16
Thailand 0.98 0.09
India 0.87 0.11
South Africa 0.87 0.21
Taiwan 0.83 0.08
Mexico 0.80 0.33
Spain 0.62 0.84
Malaysia 0.54 0.08
Brazil 0.52 0.06
Austria 0.51 0.16
New Zealand 0.41 0.21
Indonesia 0.37 0.02
Norway 0.37 0.29
United States 0.29 44.09
Netherlands 0.24 0.19
Chile 0.21 0.01
Ireland 0.16 0.19
South Korea 0.15 0.00
Turkey 0.08 0.02
Russia 0.08 0.00
Finland 0.06 0.16
Poland 0.05 0.00
Greece 0.05 0.00
Italy 0.02 0.05
Argentina 0.00 0.00
Colombia 0.00 0.00
Czech Republic 0.00 0.00
Denmark 0.00 0.00
Estonia 0.00 0.00
Hungary 0.00 0.00
Latvia 0.00 0.00
Lithuania 0.00 0.00
Pakistan 0.00 0.00
Peru 0.00 0.00
Portugal 0.00 0.00
Slovakia 0.00 0.00
Venezuela 0.00 0.00

在 python 中用 selenium 抓取网站时单击两个连续的按钮

Clicking two consecutive buttons while scraping a website with selenium in python

python

screen-scraping

clicking