使用 following-sibling 访问 following-sibling 内的 div
Using following-sibling to access divs within following-sibling
我正在尝试从中获取信息 URL:
我要提取文字"Hot 8 Brass Band are a Grammy-nominated New Orleans based brass band, whose sound... "
等
我的方法:我想在不使用显式 div 名称的情况下提取信息(因为这往往会改变。)因此,我使用变量识别“关于 Hot 8 Brass Band”,然后我想访问 following-siblings 和 child div 等
代码:
url = "https://www.bandsintown.com/e/1024477910-hot-8-brass-band-at-the-howlin'-wolf?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event"
driver.get(url)
#Get artist
try:
artist = driver.find_elements_by_css_selector('a[href^="https://www.bandsintown.com/a/"] h1')
artist = artist[0].text
print(artist)
except (ElementNotVisibleException, NoSuchElementException, TimeoutException):
print ("artist doesn't exist")
#Get Bio Info
try:
readMoreBio = driver.find_element_by_xpath("//div[text()='Read More']").click()
print("Read More Bio Clicked")
except (ElementNotVisibleException, NoSuchElementException, TimeoutException):
pass
#Once read more clicked, get full bio info
try:
artistBioDiv = driver.find_elements_by_xpath("(//div[text()='About " + artist + "'])[0]/following-sibling/following-sibling::div")
print("artistBioDiv is: ", artistBioDiv)
except (ElementNotVisibleException, NoSuchElementException, TimeoutException):
print ("artist bio doesn't exist")
这似乎访问了一个空元素,即它没有找到 bio 段落。
这是 HTML 结构:
我认为问题出在您用来查找简介的 XPATH 上。
您可以为未来的项目考虑的一些事项:
- 使用
driver.find_element(By.CSS_SELECTOR, 'CSS_SELECTOR_GOES_HERE')
或 driver.find_element(By.XPATH, 'XPATH_GOES_HERE')
,因为 find_elements_by_xpath
和 find_elements_by_css_selector
已弃用
- 使用
WebDriverWait
为加载元素留出足够的时间
- 您还可以在 xpath 中匹配文本时使用
normalize-space()
,因为它会处理前导或尾随空格
此代码应该适合您:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver.chrome.options import Options
from time import sleep
options = Options()
options.add_argument("--disable-notifications")
driver = webdriver.Chrome(executable_path='D://chromedriver/100/chromedriver.exe', options=options)
wait = WebDriverWait(driver, 20)
url = "https://www.bandsintown.com/e/1024477910-hot-8-brass-band-at-the-howlin'-wolf?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event"
driver.get(url)
try:
# with xpath
# artist = wait.until(EC.presence_of_element_located((By.XPATH, '//h1[contains(@href, "https://www.bandsintown.com/a")]'))).text
artist = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'h1[href^="https://www.bandsintown.com/a/"]'))).text
# read more
wait.until(EC.presence_of_element_located((By.XPATH, '//div[normalize-space()="Read More"]'))).click()
# bio
bio = wait.until(EC.presence_of_element_located((By.XPATH, f'//div[normalize-space()="About {artist}"]/following-sibling::div/div[2]/div'))).text
print(f'Artist: {artist}\nBio:\n{bio}')
except Exception as ex:
print(f"Error: {ex})
要提取文本 ...Hot 8 Brass Band 是一支 Grammy-nominated 新奥尔良的铜管乐队,其声音... ... 您可以使用以下任一项 :
使用 xpath 和 text 属性:
driver.get("https://www.bandsintown.com/e/1024477910-hot-8-brass-band-at-the-howlin'-wolf?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event")
print(driver.find_element(By.XPATH, "//div[@id='main']//div[text()='About Hot 8 Brass Band']//following-sibling::div[1]//div/div[contains(., 'Hot 8 Brass Band')]").text)
理想情况下你需要诱导 WebDriverWait for the and you can use either of the following :
使用 XPATH 和 get_attribute("innerHTML")
:
driver.get("https://www.bandsintown.com/e/1024477910-hot-8-brass-band-at-the-howlin'-wolf?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='main']//div[text()='About Hot 8 Brass Band']//following-sibling::div[1]//div/div[contains(., 'Hot 8 Brass Band')]"))).get_attribute("innerHTML"))
控制台输出:
Hot 8 Brass Band are a Grammy-nominated New Orleans based brass band, whose sound draws on the traditional jazz heritage of New Orleans, alongside more modern styles incl...
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
我正在尝试从中获取信息 URL:
我要提取文字"Hot 8 Brass Band are a Grammy-nominated New Orleans based brass band, whose sound... "
等
我的方法:我想在不使用显式 div 名称的情况下提取信息(因为这往往会改变。)因此,我使用变量识别“关于 Hot 8 Brass Band”,然后我想访问 following-siblings 和 child div 等
代码:
url = "https://www.bandsintown.com/e/1024477910-hot-8-brass-band-at-the-howlin'-wolf?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event"
driver.get(url)
#Get artist
try:
artist = driver.find_elements_by_css_selector('a[href^="https://www.bandsintown.com/a/"] h1')
artist = artist[0].text
print(artist)
except (ElementNotVisibleException, NoSuchElementException, TimeoutException):
print ("artist doesn't exist")
#Get Bio Info
try:
readMoreBio = driver.find_element_by_xpath("//div[text()='Read More']").click()
print("Read More Bio Clicked")
except (ElementNotVisibleException, NoSuchElementException, TimeoutException):
pass
#Once read more clicked, get full bio info
try:
artistBioDiv = driver.find_elements_by_xpath("(//div[text()='About " + artist + "'])[0]/following-sibling/following-sibling::div")
print("artistBioDiv is: ", artistBioDiv)
except (ElementNotVisibleException, NoSuchElementException, TimeoutException):
print ("artist bio doesn't exist")
这似乎访问了一个空元素,即它没有找到 bio 段落。
这是 HTML 结构:
我认为问题出在您用来查找简介的 XPATH 上。
您可以为未来的项目考虑的一些事项:
- 使用
driver.find_element(By.CSS_SELECTOR, 'CSS_SELECTOR_GOES_HERE')
或driver.find_element(By.XPATH, 'XPATH_GOES_HERE')
,因为find_elements_by_xpath
和find_elements_by_css_selector
已弃用 - 使用
WebDriverWait
为加载元素留出足够的时间 - 您还可以在 xpath 中匹配文本时使用
normalize-space()
,因为它会处理前导或尾随空格
此代码应该适合您:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver.chrome.options import Options
from time import sleep
options = Options()
options.add_argument("--disable-notifications")
driver = webdriver.Chrome(executable_path='D://chromedriver/100/chromedriver.exe', options=options)
wait = WebDriverWait(driver, 20)
url = "https://www.bandsintown.com/e/1024477910-hot-8-brass-band-at-the-howlin'-wolf?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event"
driver.get(url)
try:
# with xpath
# artist = wait.until(EC.presence_of_element_located((By.XPATH, '//h1[contains(@href, "https://www.bandsintown.com/a")]'))).text
artist = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'h1[href^="https://www.bandsintown.com/a/"]'))).text
# read more
wait.until(EC.presence_of_element_located((By.XPATH, '//div[normalize-space()="Read More"]'))).click()
# bio
bio = wait.until(EC.presence_of_element_located((By.XPATH, f'//div[normalize-space()="About {artist}"]/following-sibling::div/div[2]/div'))).text
print(f'Artist: {artist}\nBio:\n{bio}')
except Exception as ex:
print(f"Error: {ex})
要提取文本 ...Hot 8 Brass Band 是一支 Grammy-nominated 新奥尔良的铜管乐队,其声音... ... 您可以使用以下任一项
使用 xpath 和 text 属性:
driver.get("https://www.bandsintown.com/e/1024477910-hot-8-brass-band-at-the-howlin'-wolf?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event") print(driver.find_element(By.XPATH, "//div[@id='main']//div[text()='About Hot 8 Brass Band']//following-sibling::div[1]//div/div[contains(., 'Hot 8 Brass Band')]").text)
理想情况下你需要诱导 WebDriverWait for the
使用 XPATH 和
get_attribute("innerHTML")
:driver.get("https://www.bandsintown.com/e/1024477910-hot-8-brass-band-at-the-howlin'-wolf?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event") print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='main']//div[text()='About Hot 8 Brass Band']//following-sibling::div[1]//div/div[contains(., 'Hot 8 Brass Band')]"))).get_attribute("innerHTML"))
控制台输出:
Hot 8 Brass Band are a Grammy-nominated New Orleans based brass band, whose sound draws on the traditional jazz heritage of New Orleans, alongside more modern styles incl...
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in