由于 Cloudflare，无法使用 BeautifulSoup 从今天开始解析 coin gecko 页面

Question

from bs4 import BeautifulSoup as bs
import requests
import re
import cloudscraper

def get_btc_price(br):
  data=requests.get('https://www.coingecko.com/en/coins/bitcoin')

  soup = bs(data.text, 'html.parser')

  price1=soup.find('table',{'class':'table b-b'})
  fclas=price1.find('td')

  spans=fclas.find('span')

  price2=spans.text
  price=(price2).strip()
  x=float(price[1:])    
  y=x*br
  z=round(y,2)
  print(z)

  return z

这已经工作了几个月，今天早上它决定停止。我收到的消息如下：在继续之前检查您的浏览器....，检查您的防病毒软件或咨询管理员以获得访问权限...和一些 cloudflare 乱码。

我试过了

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
print(scraper.get("https://www.coingecko.com/en/coins/bitcoin").text)

它仍然阻止我访问。我应该怎么办？有没有其他方法可以绕过这个，还是我做错了什么。

Answer 1

在处理连接协商时，这似乎不是爬虫的问题，而是服务器的问题。

添加用户代理，否则requests使用默认

user_agent = #
response = requests.get(url, headers={ "user-agent": user_agent})

查看“要求”

url = #
response = requests.get(url)
for key, value in response.headers.items():
  print(key, ":", value)

由于 Cloudflare，无法使用 BeautifulSoup 从今天开始解析 coin gecko 页面

Can't parse coin gecko page from today with BeautifulSoup because of Cloudflare

python

beautifulsoup

cloudflare