在 selenium 中进行网页抓取时,有没有办法绕过 <span class=""> 并获取此数据?
Is there a way to bypass <span class=""> and get this data when webscraping in selenium?
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://soundcloud.com/jujubucks')
print(driver.title)
search = driver.find_element_by_tag_name('span class=""')
print(search.text)
driver.quit()
我尝试通过 class 和标签名称找到这个元素。它只是 returns 一个错误。是否可以抓取这个class中的数据?
这是错误吧returns
Traceback (most recent call last):
File "C:\Users\houst\PycharmProjects\The Machine App\Commercial Profile.py", line 16, in
<module>
search = driver.find_element_by_tag_name('span class=""')
File "C:\Users\houst\PycharmProjects\The Machine App\venv\lib\site-
packages\selenium\webdriver\remote\webdriver.py", line 530, in find_element_by_tag_name
return self.find_element(by=By.TAG_NAME, value=name)
File "C:\Users\houst\PycharmProjects\The Machine App\venv\lib\site-
packages\selenium\webdriver\remote\webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "C:\Users\houst\PycharmProjects\The Machine App\venv\lib\site-
packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\houst\PycharmProjects\The Machine App\venv\lib\site-
packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or
illegal selector was specified
(Session info: chrome=92.0.4515.131)
Process finished with exit code 1
您使用了错误的定位器。
如果你想定位标签名称 span
和 class 名称 ""
的元素,即空 class 名称,你应该使用 css 选择器
span[class='']
或 xpath
//span[@class='']
find_element_by_tag_name
方法接收元素标签名,如span
、div
、a
等,但没有标签名span class=""
正如我猜想的那样,您的代码应该是这样的:
wait.until(EC.visibility_of_element_located((By.XPATH, "//span[@class='']")))
search = driver.find_elements_by_xpath("//span[@class='']")
for el in search:
print(el.text)
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://soundcloud.com/jujubucks')
print(driver.title)
search = driver.find_element_by_tag_name('span class=""')
print(search.text)
driver.quit()
我尝试通过 class 和标签名称找到这个元素。它只是 returns 一个错误。是否可以抓取这个class中的数据?
这是错误吧returns
Traceback (most recent call last):
File "C:\Users\houst\PycharmProjects\The Machine App\Commercial Profile.py", line 16, in
<module>
search = driver.find_element_by_tag_name('span class=""')
File "C:\Users\houst\PycharmProjects\The Machine App\venv\lib\site-
packages\selenium\webdriver\remote\webdriver.py", line 530, in find_element_by_tag_name
return self.find_element(by=By.TAG_NAME, value=name)
File "C:\Users\houst\PycharmProjects\The Machine App\venv\lib\site-
packages\selenium\webdriver\remote\webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "C:\Users\houst\PycharmProjects\The Machine App\venv\lib\site-
packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\houst\PycharmProjects\The Machine App\venv\lib\site-
packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or
illegal selector was specified
(Session info: chrome=92.0.4515.131)
Process finished with exit code 1
您使用了错误的定位器。
如果你想定位标签名称 span
和 class 名称 ""
的元素,即空 class 名称,你应该使用 css 选择器
span[class='']
或 xpath
//span[@class='']
find_element_by_tag_name
方法接收元素标签名,如span
、div
、a
等,但没有标签名span class=""
正如我猜想的那样,您的代码应该是这样的:
wait.until(EC.visibility_of_element_located((By.XPATH, "//span[@class='']")))
search = driver.find_elements_by_xpath("//span[@class='']")
for el in search:
print(el.text)