为什么 Selenium 只获取页面上第一个工具提示的文本?

Why is Selenium only fetching the text of the first ToolTip on the page?

作为使用 Python、Selenium 和 BeautifulSoup 构建的大型网络爬虫的一部分,我正在尝试获取此页面上所有工具提示的文本:https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth

我当前的代码成功获取了所有 link 并将鼠标悬停在每个 link 上——当我 运行 它时,我看到每个工具提示都连续弹出。但是,它只输出第一个工具提示的文本。我不知道为什么!我以为我可能只需要更长的鼠标悬停之间的等待时间,但上升到 20 秒并没有解决问题。


 bill_links = soup.find_all('a', {'id': re.compile('Bill')})
 summaries = []
 bill_numbers = [link.text.strip() for link in bill_links]

 for link in bill_links:
   billid = link.get('id')
   action = ActionChains(driver)
   summary = driver.find_element_by_class_name("ToolTip-BillSummary-ShortTitle").text
   summaries = summaries + [summary]

同样,第一个 print(summary) 命令成功返回了第一个工具提示的文本 ("An Act amending the act of January 17, 1968...") -- 但每个后续的 print(summary) 命令只是 returns 一个空白。



summary = driver.find_element_by_class_name("ToolTip-BillSummary-ShortTitle").text

您查找相应元素的条件仅受该元素的 class 名称限制,这个单一条件可能会给您一个元素列表,但您实际上并没有指定要获取文本的元素.

要解决此问题,请改用 xpath 表达式(您需要使用索引变量来定位元素):

summary = driver.find_element_by_xpath("//*[@id="qtip-" + <index> + "-content"]/div/div[3]").text


不需要硒。如果它确实是显示的工具提示(不是全文),您可以使用 bs4 并复制页面使用的 javascript 函数。函数调用的参数位于每个账单列表的 a 标签旁边的脚本标签中。我从适当的字符串中将这些正则表达式传递给我们的用户定义函数(它复制 jquery 函数)



import requests
from bs4 import BeautifulSoup as bs
import re

def add_bill_summary_tooltip(s, session_year, session_ind, bill_body, bill_type, bill_no):
    url = g_server_url + '/cfdocs/cfc/GenAsm.cfc?returnformat=plain'
    data = { 'method' : 'GetBillSummaryTooltip',
            'SessionYear' : session_year,
            'SessionInd' : session_ind,
            'BillBody' : bill_body,
            'BillType' : bill_type,
            'BillNo' : bill_no,
            'IsAjaxRequest' : '1'

    r = s.get(url, params = data)
    soup = bs(r.content, 'lxml')
    tooltip = soup.select_one('.ToolTip-BillSummary-ShortTitle')
    if tooltip is not None:
        tooltip = tooltip.text.strip()
    return tooltip

g_server_url = "https://www.legis.state.pa.us"

with requests.Session() as s:
    r = s.get('https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth')
    soup = bs(r.content, 'lxml')
    tooltips = {item.select_one('a').text:item.select_one('script').text[:-1] for item in soup.select('.DataTable td:has(a)')}
    p = re.compile(r"'(.*?)',(.*),(.*),'(.*)','(.*)','(.*)'")
    for bill in tooltips:
        arg1,arg2,arg3,arg4,arg5,arg6 = p.findall(tooltips[bill])[0]
        tooltips[bill] = add_bill_summary_tooltip(s, arg2, arg3,arg4,arg5,arg6)




import requests
from bs4 import BeautifulSoup as bs

def add_bill_summary_full(s, url): 
    r = s.get(url)
    soup = bs(r.content, 'lxml')
    summary = soup.select_one('.BillInfo-Section-Data div')
    if summary is not None:
        summary = summary.text
    return summary

g_server_url = "https://www.legis.state.pa.us"

with requests.Session() as s:
    r = s.get('https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth')
    soup = bs(r.content, 'lxml')
    full_text = {item.text:g_server_url + item['href'] for item in soup.select('.DataTable a')}
    for k,v in full_text.items():
        full_text[k] = add_bill_summary_full(s, v)




   function AddBillSummaryTooltip(element,SessionYear,SessionInd,BillBody,BillType,BillNo) {
            content: {
                text: function(event, api) {
                        url: g_ServerURL + '/cfdocs/cfc/GenAsm.cfc?returnformat=plain',
         data: {
          method: 'GetBillSummaryTooltip',
          SessionYear: SessionYear,
          SessionInd: SessionInd,
          BillBody: BillBody,
          BillType: BillType,
          BillNo: BillNo,
          IsAjaxRequest: 1


试试看 here.


如果您使用 ,则不必使用 BeautifulSoup。要提取页面上所有工具提示的文本 https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth 您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.action_chains import ActionChains
    chrome_options = webdriver.ChromeOptions() 
    driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    for elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='DataTable']/tbody//tr/td/a"))):
        senete_bill_shorten_number = elem.get_attribute("innerHTML").split()[1]
        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='ToolTip-BillSummary']/div[@class='ToolTip-BillSummary-Title' and contains(., '" + senete_bill_shorten_number + "')]//following::div[2]"))).get_attribute("innerHTML"))
  • 控制台输出:

                        An Act amending the act of January 17, 1968 (P.L.11, No.5), known as The Minimum Wage Act of 1968,  further providing for definitions and for minimum wages; providing for gratuities; further providing for enforcement and rules and regulations, for pe ...
                        An Act providing for mandatory Statewide employer-paid sick leave for employees and for civil penalties and remedies.
                        An Act amending Title 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in judicial boards and commissions, providing for adoption of guidelines for administrative probation violations; and, in sentencing, further provi ...
                        An Act amending the act of May 22, 1951 (P.L.317, No.69), known as The Professional Nursing Law,  further providing for title, for definitions, for State Board of Nursing, for dietitian-nutritionist license required, for unauthorized practices and ac ...
                        An Act amending the act of March 4, 1971 (P.L.6, No.2), known as the Tax Reform Code of 1971, providing for Pennsylvania Housing Tax Credit.
                        An Act amending the act of December 3, 1959 (P.L.1688, No.621), known as the Housing Finance Agency Law, in Pennsylvania Housing Affordability and Rehabilitation Enhancement Program, further providing for fund.
                        An Act amending the act of March 10, 1949 (P.L.30, No.14), known as the Public School Code of 1949, in charter schools, further providing for funding for charter schools.
                        An Act amending the act of June 13, 1967 (P.L.31, No.21), known as the Human Services Code,  in departmental powers and duties as to supervision, providing for lead testing in children's institutions; and, in departmental powers and duties as to lice ...
                        An Act providing for the protection of water supplies.
                        An Act amending Title 35 (Health and Safety) of the Pennsylvania Consolidated Statutes, providing for emergency addiction treatment; and imposing powers and duties on the Department of Drug and Alcohol Programs.
                        An Act amending Title 18 (Crimes and Offenses) of the Pennsylvania Consolidated Statutes, providing for transfer and sale of animals.
                        An Act amending Title 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in particular rights and immunities, providing for civil immunity of person rescuing minor from motor vehicle.
                        An Act providing for health care insurance coverage protections, for duties of the Insurance Department and the Insurance Commissioner, for regulations, for enforcement and for penalties.
                        An Act amending the act of May 17, 1921 (P.L.682, No.284), known as The Insurance Company Law of 1921, in casualty insurance, providing coverage for essential health benefits.
                        An Act amending the act of October 27, 1955 (P.L.744, No.222), known as the Pennsylvania Human Relations Act, further providing for definitions and for unlawful discriminatory practices.
                        An Act amending Titles 18 (Crimes and Offenses) and 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in human trafficking, further providing for the offense of trafficking in individuals and for the offense of patroniz ...
                        An Act amending Title 75 (Vehicles) of the Pennsylvania Consolidated Statutes, in registration of vehicles, further providing for veteran plates and placard.
                        An Act providing for health insurance coverage requirements for stage four, advanced metastatic cancer.
                        An Act authorizing the Commonwealth of Pennsylvania to join the Psychology Interjurisdictional Compact; providing for the form of the compact; imposing additional powers and duties on the Governor, the Secretary of the Commonwealth and the Compact.
                        An Act amending Titles 42 (Judiciary and Judicial Procedure) and 75 (Vehicles) of the Pennsylvania Consolidated Statutes, in sentencing, further providing for payment of court costs, restitution and fines, for fine and for failure to pay fine; in lic ...
                        An Act amending the act of January 17, 1968 (P.L.11, No.5), known as The Minimum Wage Act of 1968,  further providing for definitions and for rate of minimum wages; and providing for reporting by the Department of Labor and Industry.
                        An Act amending Title 23 (Domestic Relations) of the Pennsylvania Consolidated Statutes, in marriage license, further providing for restrictions on issuance of license.
                        An Act amending the act of March 4, 1971 (P.L.6, No.2), known as the Tax Reform Code of 1971, in sales and use tax, further providing for exclusions from tax.