beautifulsoup 的网络抓取没有找到任何东西
Web scraping with beautifulsoup not finding anything
我正在尝试抓取 coinmarketcap.com 只是为了获取特定货币价格的更新,也只是为了学习如何进行网络抓取。我还是个初学者,不知道哪里出了问题,因为每当我尝试 运行 时,它只会告诉我有 none。虽然我知道那条线确实存在。感谢您的帮助!
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
price = soup.find('data-currency-price data-usd=')
print (price)
你可以得到这样的值:
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
price = soup.find("span", id="quote_price").get('data-usd')
print (price)
您应该尝试更具体地说明您希望如何查找该项目。
你目前正在使用 soup.find('')
我不确定你在写 data-currency-price data-usd= 时在里面放了什么
那个 ID 是 class 名字吗?
为什么不尝试使用 ID 查找项目。
soup.find(id="link3")
或按标签查找
soup.find("relevant tag name like div or a")
或类似这样的东西
find_this = soup.find("a", id="ID HERE")
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
x=soup(id="quote_price").text
print (x)
找ID比较好,或者搜索soup.find_all(text="data-currency-price data-usd")[1].text
您可以使用 class 属性来获取值。
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
price = soup.find('span' ,attrs={"class" : "h2 text-semi-bold details-panel-item--price__value"})
print (price.text)
输出:
0.006778
如果您打算做很多这样的事情,请考虑使用 official API and get all the prices. Then extract what you want. The following is from the site with an amendment by me to show the desired value for electroneum. The API guidance shows how to retrieve one at a time 进行一次调用,尽管这需要比基本计划更高的计划。
from requests import Request, Session
from requests.exceptions import ConnectionError, Timeout, TooManyRedirects
import json
url = 'https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/latest'
parameters = {
'start': '1',
'limit': '5000',
'convert': 'USD',
}
headers = {
'Accepts': 'application/json',
'X-CMC_PRO_API_KEY': 'yourKey',
}
session = Session()
session.headers.update(headers)
try:
response = session.get(url, params=parameters)
# print(response.text)
data = json.loads(response.text)
print(data['data'][64]['quote']['USD']['price'])
except (ConnectionError, Timeout, TooManyRedirects) as e:
print(e)
您始终可以部署一个循环并根据所需列表进行检查,例如
interested = ['Electroneum','Ethereum']
for item in data['data']:
if item['name'] in interested:
print(item)
对于您当前的示例:
您可以为 data-currency-value
使用属性选择器
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
soup.select_one('[data-currency-value]').text
我正在尝试抓取 coinmarketcap.com 只是为了获取特定货币价格的更新,也只是为了学习如何进行网络抓取。我还是个初学者,不知道哪里出了问题,因为每当我尝试 运行 时,它只会告诉我有 none。虽然我知道那条线确实存在。感谢您的帮助!
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
price = soup.find('data-currency-price data-usd=')
print (price)
你可以得到这样的值:
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
price = soup.find("span", id="quote_price").get('data-usd')
print (price)
您应该尝试更具体地说明您希望如何查找该项目。
你目前正在使用 soup.find('')
我不确定你在写 data-currency-price data-usd= 时在里面放了什么
那个 ID 是 class 名字吗?
为什么不尝试使用 ID 查找项目。
soup.find(id="link3")
或按标签查找
soup.find("relevant tag name like div or a")
或类似这样的东西
find_this = soup.find("a", id="ID HERE")
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
x=soup(id="quote_price").text
print (x)
找ID比较好,或者搜索soup.find_all(text="data-currency-price data-usd")[1].text
您可以使用 class 属性来获取值。
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
price = soup.find('span' ,attrs={"class" : "h2 text-semi-bold details-panel-item--price__value"})
print (price.text)
输出:
0.006778
如果您打算做很多这样的事情,请考虑使用 official API and get all the prices. Then extract what you want. The following is from the site with an amendment by me to show the desired value for electroneum. The API guidance shows how to retrieve one at a time 进行一次调用,尽管这需要比基本计划更高的计划。
from requests import Request, Session
from requests.exceptions import ConnectionError, Timeout, TooManyRedirects
import json
url = 'https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/latest'
parameters = {
'start': '1',
'limit': '5000',
'convert': 'USD',
}
headers = {
'Accepts': 'application/json',
'X-CMC_PRO_API_KEY': 'yourKey',
}
session = Session()
session.headers.update(headers)
try:
response = session.get(url, params=parameters)
# print(response.text)
data = json.loads(response.text)
print(data['data'][64]['quote']['USD']['price'])
except (ConnectionError, Timeout, TooManyRedirects) as e:
print(e)
您始终可以部署一个循环并根据所需列表进行检查,例如
interested = ['Electroneum','Ethereum']
for item in data['data']:
if item['name'] in interested:
print(item)
对于您当前的示例:
您可以为 data-currency-value
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/electroneum/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
soup.select_one('[data-currency-value]').text