如何使用 Python Selenium 访问特定元素?

How to access the specific elements using Python Selenium?

我有一个简单的 selenium python 应用程序,我试图通过网络抓取类别,即链接。我遇到的问题是使用 xpath 获取左窗格中的链接作为列表。此外,我想捕获行 class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context,但我不确定从哪里开始因为它不会显示在 html 或 chrome 开发工具中。

我正在从以下网站提取数据:

https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm

我当前的代码是未注释的有效代码:

from selenium import webdriver
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.proxy import Proxy, ProxyType
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

#service = Service('C:\Program Files\Chrome Driver\chromedriver.exe')
URL = "https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm"
driver = webdriver.Chrome('C:\Program Files\Chrome Driver\chromedriver.exe')
driver.get(URL)


category = driver.find_elements_by_class_name(By.XPATH, "//div[@class='service drug_class']//a")
print(category)


#WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'tr.dbsearch')))
#pd.read_html(driver.page_source)[1].iloc[:,:-1].to_csv('table.csv',index=False)
#time.sleep(8)
#driver.quit()

此外,我一直在尝试获取页面上显示的内容:

class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context

如何访问该文本?我尝试的所有操作都给出了 no such element or no such class name 作为错误。主要问题是我不确定如何在 javascript 中找到这些元素或 classes 的名称,如果它们不存在于 HTML 或 [=30] 中=] 开发工具?

我在使用以下内容时收到的错误消息是:

category = driver.find_elements_by_class_name(By.XPATH, "//div[@class='service drug_class']//a")
print(category)

TypeError: find_elements_by_class_name() takes 2 positional arguments but 3 were given

您似乎在寻找强标签,而左侧的所有 link 都是元素。这意味着你不会用强找到它们。

基本上你正在寻找这个 xpath 来获得任何 link:

//div[@class='service drug_class']//a[text()='Any link text here']

将此处的任何 link 文本替换为准确的 link 文本。

要在页面左侧打印链接的强文本名称,您必须诱导 for the visibility_of_all_elements_located() and you can use either of the following :

  • 使用CSS_SELECTOR:

    driver.get("https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm")
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.drug_class img +a")))])
    
  • 控制台输出:

    ['Anatomical Therapeutic Chemical (ATC1-4)', 'ALIMENTARY TRACT AND METABOLISM (397)', 'ANABOLIC AGENTS FOR SYSTEMIC USE (9)', 'ANTIDIARRHEALS, INTESTINAL ANTIINFLAMMATORY/ANTIINFECTIVE AGENTS (44)', 'ANTIEMETICS AND ANTINAUSEANTS (13)', 'ANTIOBESITY PREPARATIONS, EXCL. DIET PRODUCTS (12)', 'BILE AND LIVER THERAPY (13)', 'DIGESTIVES, INCL. ENZYMES (7)', 'DRUGS FOR ACID RELATED DISORDERS (35)', 'DRUGS FOR CONSTIPATION (39)', 'DRUGS FOR FUNCTIONAL GASTROINTESTINAL DISORDERS (47)', 'DRUGS USED IN DIABETES (69)', 'MINERAL SUPPLEMENTS (30)', 'OTHER ALIMENTARY TRACT AND METABOLISM PRODUCTS (41)', 'STOMATOLOGICAL PREPARATIONS (31)', 'TONICS (0)', 'VITAMINS (23)', 'BLOOD AND BLOOD FORMING ORGANS (158)', 'CARDIOVASCULAR SYSTEM (326)', 'DERMATOLOGICALS (242)', 'GENITO URINARY SYSTEM AND SEX HORMONES (160)', 'SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (66)', 'ANTIINFECTIVES FOR SYSTEMIC USE (334)', 'ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS (324)', 'MUSCULO-SKELETAL SYSTEM (130)', 'NERVOUS SYSTEM (433)', 'ANTIPARASITIC PRODUCTS, INSECTICIDES AND REPELLENTS (77)', 'RESPIRATORY SYSTEM (213)', 'SENSORY ORGANS (174)', 'VARIOUS (137)', 'Established Pharmacologic Classes (EPC) [from DailyMed]', 'MeSH Pharmacologic Actions (MESHPA)', 'Diseases, Life Phases, Behavior Mechanisms and Physiologic States', 'Substances and Cells (CHEM) [from DailyMed]', 'Mechanism of Action (MoA) [from DailyMed]', 'Physiologic Effect (PE) [from DailyMed]', 'Pharmacokinetics (PK)', 'VA Classes (VA)', 'Therapeutic Categories (TC)', 'Disposition (DISPOS) [from SNOMEDCT]', 'Structure (STRUCT) [from SNOMEDCT]', 'CSA Schedule (SCHEDULE)']
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC