Cloudflare 抓取、查找元素

Question

我一直在玩 cfscrape 模块，它允许你绕过网站上的 cloudflare 验证码保护...我已经访问了页面的内容，但似乎无法让我的代码工作，而是整个 HTML 被打印出来。我只是想在 <span class="availability">

中查找关键字

import urllib2
import cfscrape
from bs4 import BeautifulSoup
import requests
from lxml import etree
import smtplib
import urllib2, sys
scraper = cfscrape.CloudflareScraper()
url = "http://www.sneakersnstuff.com/en/product/25698/adidas-stan-smith-gtx"
req = scraper.get(url).content


try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print("hi")
    content = e.fp.read() 


soup = BeautifulSoup(content, "lxml")
result = soup.find_all("span", {"class":"availability"})

我省略了一些不相关的代码

Answer 1

try:
    page = urllib2.urlopen(req)
    content = page.read()
except urllib2.HTTPError, e:
    print("hi")

您应该阅读包含 html 代码的 urlopen 对象。

你应该把 content 变量放在 except 之前。

Cloudflare 抓取、查找元素

Cloudflare scraping, finding elements

html

python

beautifulsoup

web-scraping

cloudflare