使用 Selenium 从 pubchem 站点获取数字

Getting a number from pubchem site with Selenium

我正在使用以下代码在 pubchem 网站上进行搜索。我需要从屏幕上的搜索结果中获取“化合物 CID:”编号,但我无法获取。我需要这方面的帮助。

driver = webdriver.Chrome()
url = "https://pubchem.ncbi.nlm.nih.gov/"
driver.get(url)
driver.maximize_window()
searchInput = driver.find_element_by_xpath("/html/body/div[1]/div/div/main/div[1]/div/div[2]/div/div[2]/form/div/div[1]/input")
searchInput.click()
searchInput.send_keys("75-05-8")
searchInput.send_keys(Keys.ENTER)
time.sleep(2)
driver.close()

要打印文本 6342 您可以使用以下任一方法 :

  • 使用css_selectorget_attribute("innerHTML"):

    print(driver.find_element(By.CSS_SELECTOR, "a[data-label^='Featured Compound Result Secondary Link; Position:1; Page:1'] > span.breakword > span").get_attribute("innerHTML"))
    
  • 使用 xpathtext 属性:

    print(driver.find_element(By.XPATH, "//a[starts-with(@data-label, 'Featured Compound Result Secondary Link; Position:1; Page:1')]/span[@class='breakword']/span").text)
    

理想情况下,您需要为 引入 您可以使用以下任一项 :

  • 使用 CSS_SELECTORtext 属性:

    driver.get("https://pubchem.ncbi.nlm.nih.gov/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[type='text'][id^='search']"))).send_keys("75-05-8" + Keys.RETURN)
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a[data-label^='Featured Compound Result Secondary Link; Position:1; Page:1'] > span.breakword > span"))).text)
    
  • 使用 XPATHget_attribute("innerHTML"):

    driver.get("https://pubchem.ncbi.nlm.nih.gov/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@type='text'][starts-with(@id, 'search')]"))).send_keys("75-05-8" + Keys.RETURN)
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@data-label, 'Featured Compound Result Secondary Link; Position:1; Page:1')]/span[@class='breakword']/span"))).text)
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • 控制台输出:

    6342
    

You can find a relevant discussion in


参考资料

Link 到有用的文档:

  • get_attribute()方法Gets the given attribute or property of the element.
  • text属性returnsThe text of the element.
  • Difference between text and innerHTML using Selenium