BeautifulSoup 不会使用 selenium 获取页面源

Question

我正在尝试抓取网页，但无法使用 selenium 获取网站的 html 文本。

到目前为止，这是我的代码

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import urlparse

search_term = raw_input("What is your search term?: ")
url = "https://www.google.co.uk/search?client=ubuntu&channel=fs&q="
googurl = url+search_term
driver = webdriver.Firefox()

htmltext = driver.get(googurl)
soup = BeautifulSoup(htmltext.page_source)

这样做我得到了回溯

What is your search term?: hi
Traceback (most recent call last):
  File "google page click.py", line 15, in <module>
    soup = BeautifulSoup(htmltext.page_source)
AttributeError: 'NoneType' object has no attribute 'page_source'

Answer 1

您想始终使用驱动程序对象：

driver.get(googurl)
soup = BeautifulSoup(driver.page_source)

BeautifulSoup 不会使用 selenium 获取页面源

BeautifulSoup won't get the page source using selenium

python

selenium

beautifulsoup