python 请求在抓取时只返回空集
python requests only returning empty sets when scraping
这是我第一次尝试编程。我正在尝试使用 bs4、selenium 等 scrape 来 scraping 一些词...
我使用的网站是'http://oulim.kr'
我如何抓取 框架集内的东西?
这是我试过的
import urllib
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'http://oulim.kr/'
driver = webdriver.Chrome('./driver/chromedriver')
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html)
a = soup.select("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a)
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('http://oulim.kr')
r.html.find('.tbody')
Selenium
将框架视为单独的页面(因为它必须单独加载)并且不在框架中搜索。并且 page_source
不 return HTML
来自框架。
您必须找到 <frame>
并切换到正确的框架 switch_to.frame(..)
才能使用它。
frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])
import urllib
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'http://oulim.kr/'
driver = webdriver.Chrome('./driver/chromedriver')
driver.get(url)
# --- switch frame ---
frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])
# --- CSS without BeautifulSoup ---
a = driver.find_element_by_css_selector("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a.text)
# --- CSS with BeautifulSoup ---
html = driver.page_source
soup = BeautifulSoup(html)
a = soup.select("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a[0].text)
这是我第一次尝试编程。我正在尝试使用 bs4、selenium 等 scrape 来 scraping 一些词... 我使用的网站是'http://oulim.kr'
我如何抓取 框架集内的东西?
这是我试过的
import urllib
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'http://oulim.kr/'
driver = webdriver.Chrome('./driver/chromedriver')
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html)
a = soup.select("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a)
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('http://oulim.kr')
r.html.find('.tbody')
Selenium
将框架视为单独的页面(因为它必须单独加载)并且不在框架中搜索。并且 page_source
不 return HTML
来自框架。
您必须找到 <frame>
并切换到正确的框架 switch_to.frame(..)
才能使用它。
frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])
import urllib
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'http://oulim.kr/'
driver = webdriver.Chrome('./driver/chromedriver')
driver.get(url)
# --- switch frame ---
frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])
# --- CSS without BeautifulSoup ---
a = driver.find_element_by_css_selector("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a.text)
# --- CSS with BeautifulSoup ---
html = driver.page_source
soup = BeautifulSoup(html)
a = soup.select("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a[0].text)