如何使用 Selenium 和 Python 使用在 "inspect" 中可见但在页面源中不可见的实体名称列的值创建列表

How to create a list with the values from entity-name column which is visible in "inspect" but not visible in page source using Selenium and Python

我正在尝试从 EDGAR 中抓取 list

我需要的信息(例如“entity-name”)在“td”中class。但是,我目前拥有的代码没有 return 任何东西。我将不胜感激任何帮助。提前致谢!

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

s = Service('/PATH/chromedriver')
driver = webdriver.Chrome(service=s)
driver.get("https://www.sec.gov/edgar/search/#/q=%2522cyber%2520insurance%2522&dateRange=custom&category=form-cat1&startdt=2011-01-01&enddt=2022-03-12&filter_forms=10-K")
try:
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'entity-name')))
except TimeoutException:
    print('Page timed out after 10 secs.')

page = BeautifulSoup(driver.page_source,'html.parser')
print(page)

entity-name 列而不是 you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following :

中提取文本
  • 使用 CSS_SELECTORtext 属性:

    driver.get('https://www.sec.gov/edgar/search/#/q=%2522cyber%2520insurance%2522&dateRange=custom&category=form-cat1&startdt=2011-01-01&enddt=2022-03-12&filter_forms=10-K')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.entity-name")))])
    
  • 使用 XPATHget_attribute("innerHTML"):

    driver.get('https://www.sec.gov/edgar/search/#/q=%2522cyber%2520insurance%2522&dateRange=custom&category=form-cat1&startdt=2011-01-01&enddt=2022-03-12&filter_forms=10-K')
    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@class='entity-name']")))])
    
  • 控制台输出:

    ['Excel Corp ', 'PROGRESSIVE CORP/OH/  (PGR) ', 'Electromed, Inc.  (ELMD) ', 'HOOKER FURNITURE CORP  (HOFT) ', 'HOOKER FURNITURE CORP  (HOFT) ', 'SOUTHERN CO  (SO, SOJA, SOJB, SOJC, SOJD, SOLN) <br> ALABAMA POWER CO  (ALPVN, APRCP, APRDM, APRDN, APRDO, APRDP, ALP-PQ) <br> GEORGIA POWER CO  (GPJA) <br> MISSISSIPPI POWER CO <br> SOUTHERN Co GAS <br> SOUTHERN POWER CO ', 'HOOKER FURNITURE CORP  (HOFT) ', 'SOUTHERN CO  (SO, SOJA, SOJB, SOJC, SOJD, SOLN) <br> ALABAMA POWER CO  (ALPVN, APRCP, APRDM, APRDN, APRDO, APRDP, ALP-PQ) <br> GEORGIA POWER CO  (GPJA) <br> MISSISSIPPI POWER CO <br> SOUTHERN Co GAS <br> SOUTHERN POWER CO ', 'BENCHMARK ELECTRONICS INC  (BHE) ', 'MARRIOTT INTERNATIONAL INC /MD/  (MAR) ', 'Sprouts Farmers Market, Inc.  (SFM) ', 'CF BANKSHARES INC.  (CFBK) ', 'Repay Holdings Corp  (RPAY) ', 'Sprouts Farmers Market, Inc.  (SFM) ', 'MARRIOTT INTERNATIONAL INC /MD/  (MAR) ', 'Sprouts Farmers Market, Inc.  (SFM) ', 'Albertsons Companies, Inc.  (ACI) ', 'MARRIOTT INTERNATIONAL INC /MD/  (MAR) ', 'MARRIOTT INTERNATIONAL INC /MD/  (MAR) ', 'HENNESSY ADVISORS INC  (HNNA) ', 'Repay Holdings Corp  (RPAY, RPAYW) ', 'Repay Holdings Corp  (RPAY, RPAYW, TBRGU) ', 'Arlo Technologies, Inc.  (ARLO) ', 'Repay Holdings Corp  (RPAY, RPAYW) ', 'NATIONAL HEALTH INVESTORS INC  (NHI) ', 'MOTORCAR PARTS AMERICA INC  (MPAA) ', 'RGC RESOURCES INC  (RGCO) ', 'Arlo Technologies, Inc.  (ARLO) ', 'CRYOLIFE INC  (CRY) ', 'Mimecast Ltd  (MIME) ', 'RGC RESOURCES INC  (RGCO) ', 'MOTORCAR PARTS AMERICA INC  (MPAA) ', 'NOODLES &amp; Co  (NDLS) ', 'PAPA JOHNS INTERNATIONAL INC  (PZZA) ', 'MOTORCAR PARTS AMERICA INC  (MPAA) ', 'MOTORCAR PARTS AMERICA INC  (MPAA) ', 'PAPA JOHNS INTERNATIONAL INC  (PZZA) ', 'MOTORCAR PARTS AMERICA INC  (MPAA) ', 'Sprouts Farmers Market, Inc.  (SFM) ', 'MOTORCAR PARTS AMERICA INC  (MPAA) ', 'GARMIN LTD  (GRMN) ', 'Sprouts Farmers Market, Inc.  (SFM) ', 'nDivision Inc.  (NDVN) ', 'nDivision Inc.  (NDVN) ', 'nDivision Inc.  (NDVN) ', 'WEYCO GROUP INC  (WEYS) ', 'DiamondRock Hospitality Co  (DRH) ', 'Pebblebrook Hotel Trust  (PEB, PEB-PC, PEB-PD, PEB-PE, PEB-PF) ', 'Sprouts Farmers Market, Inc.  (SFM) ', 'MYR GROUP INC.  (MYRG) ', 'Chatham Lodging Trust  (CLDT, CLDT-PA) ', 'WEYCO GROUP INC  (WEYS) ', 'INFINITE GROUP INC  (IMCI) ', 'DiamondRock Hospitality Co  (DRH) ', 'Pebblebrook Hotel Trust  (PEB, PEB-PC, PEB-PD, PEB-PE, PEB-PF) ', 'DiamondRock Hospitality Co  (DRH, DRH-PA) ', 'Pebblebrook Hotel Trust  (PEB, PEB-PC, PEB-PD, PEB-PE, PEB-PF) ', 'DLH Holdings Corp.  (DLHC) ', 'Summit Hotel Properties, Inc.  (INN) ', 'BOYD GAMING CORP  (BYD) ', 'Summit Hotel Properties, Inc.  (INN) ', 'DiamondRock Hospitality Co  (DRH, DRH-PA) ', 'CINCINNATI FINANCIAL CORP  (CINF) ', 'Summit Hotel Properties, Inc.  (INN) ', 'Pebblebrook Hotel Trust  (PEB, PEB-PC, PEB-PD, PEB-PE, PEB-PF) ', 'ARTIVION, INC.  (AORT) ', 'STAR GROUP, L.P.  (SGU) ', 'Pebblebrook Hotel Trust  (PEB, PEB-PE, PEB-PF, PEB-PG, PEB-PH) ', 'RGC RESOURCES INC  (RGCO) ', 'INFINITE GROUP INC  (IMCI) ', 'LEGGETT &amp; PLATT INC  (LEG) ', 'RGC RESOURCES INC  (RGCO) ', 'COSTCO WHOLESALE CORP /NEW  (COST) ', 'DLH Holdings Corp.  (DLHC) ', 'CANTERBURY PARK HOLDING CORP ', 'WEYCO GROUP INC  (WEYS) ', 'DLH Holdings Corp.  (DLHC) ', 'WEYCO GROUP INC  (WEYS) ', 'Canterbury Park Holding Corp  (CPHC) ', 'RGC RESOURCES INC  (RGCO) ', 'IEC ELECTRONICS CORP  (IEC) ', 'INFINITE GROUP INC  (IMCI) ', 'Canterbury Park Holding Corp  (CPHC) ', 'WEYCO GROUP INC  (WEYS) ', 'Canterbury Park Holding Corp  (CPHC) ', 'AMERICAN STATES WATER CO  (AWR) <br> Golden State Water CO ', 'LEGGETT &amp; PLATT INC  (LEG) ', 'Vy Global Growth  (VYGG, VYGG-UN, VYGG-WT) ', 'Summit Hotel Properties, Inc.  (INN) ', 'Vy Global Growth  (VYGG, VYGG-UN, VYGG-WT) ', 'Sunstone Hotel Investors, Inc.  (SHO, SHO-PE, SHO-PF) ', 'CRYOLIFE INC  (CRY) ', 'BOYD GAMING CORP  (BYD) ', 'Sunstone Hotel Investors, Inc.  (SHO, SHO-PE, SHO-PF) ', 'Summit Hotel Properties, Inc.  (INN, INN-PE, INN-PF) ', 'Green Bancorp, Inc.  (GNBC) ', 'TELKONET INC  (TKOI) ', 'COHEN &amp; STEERS INC  (CNS) ', 'Sunstone Hotel Investors, Inc.  (SHO, SHO-PE, SHO-PF) ', 'Green Bancorp, Inc.  (GNBC) ']
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC