bs4 如何提取 <p> 标签内的文本
bs4 How can I extract the text within <p> tag
我正在 https://coinmarketcap.com/currencies/bitcoin/ 上练习解析,我真的很想知道,我怎样才能 提取这个确切 <p>
标签中的文本 ,因为有很多这样的,我只想要一个的信息。感谢您的帮助和东西。
import requests as r
from bs4 import BeautifulSoup
def find_info(self):
api = r.get(self.url) #url is above in the description
soup = BeautifulSoup(api.text, "html.parser")
soup.find_all('p')
# and here I'm stuck.
# I need to get the text from the chunk of HTML below.
<p>
<strong>
Bitcoin price today
</strong>
is ₽3.795.164 RUB with a 24-hour trading volume of ₽6.527.780.409.893 RUB. Bitcoin is down,12% in the last 24 hours. The current CoinMarketCap ranking is #1, with a market cap of ₽70.707.857.530.563 RUB. It has a circulating supply of 18.631.043 BTC coins and a max. supply of 21.000.000 BTC coins.
</p>
我尝试了不同的方法,但是在很多 p 标签中我不知道如何得到这个。
使用 css selector
抓取您想要的段落。
方法如下:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://coinmarketcap.com/currencies/bitcoin/").content
print(BeautifulSoup(page, "html.parser").select_one('.about___1OuKY p').getText())
输出:
Bitcoin price today is ,393.64 USD with a 24-hour trading volume of ,784,693,272 USD. Bitcoin is up 4.87% in the last 24 hours. The current CoinMarketCap ranking is #1, with a market cap of 7,517,202,639 USD. It has a circulating supply of 18,631,043 BTC coins and a max. supply of 21,000,000 BTC coins.
您可以使用 get_text()
方法
[1]: BeautifulSoup getText from between <p>, not picking up subsequent paragraphs or simply check the doc: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
我正在 https://coinmarketcap.com/currencies/bitcoin/ 上练习解析,我真的很想知道,我怎样才能 提取这个确切 <p>
标签中的文本 ,因为有很多这样的,我只想要一个的信息。感谢您的帮助和东西。
import requests as r
from bs4 import BeautifulSoup
def find_info(self):
api = r.get(self.url) #url is above in the description
soup = BeautifulSoup(api.text, "html.parser")
soup.find_all('p')
# and here I'm stuck.
# I need to get the text from the chunk of HTML below.
<p>
<strong>
Bitcoin price today
</strong>
is ₽3.795.164 RUB with a 24-hour trading volume of ₽6.527.780.409.893 RUB. Bitcoin is down,12% in the last 24 hours. The current CoinMarketCap ranking is #1, with a market cap of ₽70.707.857.530.563 RUB. It has a circulating supply of 18.631.043 BTC coins and a max. supply of 21.000.000 BTC coins.
</p>
我尝试了不同的方法,但是在很多 p 标签中我不知道如何得到这个。
使用 css selector
抓取您想要的段落。
方法如下:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://coinmarketcap.com/currencies/bitcoin/").content
print(BeautifulSoup(page, "html.parser").select_one('.about___1OuKY p').getText())
输出:
Bitcoin price today is ,393.64 USD with a 24-hour trading volume of ,784,693,272 USD. Bitcoin is up 4.87% in the last 24 hours. The current CoinMarketCap ranking is #1, with a market cap of 7,517,202,639 USD. It has a circulating supply of 18,631,043 BTC coins and a max. supply of 21,000,000 BTC coins.
您可以使用 get_text()
方法
[1]: BeautifulSoup getText from between <p>, not picking up subsequent paragraphs or simply check the doc: https://www.crummy.com/software/BeautifulSoup/bs4/doc/