Web 抓取标签之间的第二个数字
Web scrape second number between tags
我是 Python 的新手,从未接触过 HTML。所以任何帮助将不胜感激。
我需要从网站的检查元素中提取两个数字:“1062”和“348”。
这是我的代码:
page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
soup = BeautifulSoup(page.content, 'html.parser')
Adv = soup.select_one ('.col-sm-6 .advDec:nth-child(1)').text[10:]
Dec = soup.select_two ('.col-sm-6 .advDec:nth-child(2)').text[10:]
网站元素如下所示:
<div class="nifty-header-shade1 col-xs-12 col-sm-6 col-md-3">
<div class="row">
<div class="col-sm-12">
<h4>Stocks</h4>
</div>
<div class="col-sm-6">
<p class="advDec"><a href="/?pageView=nse-top-gainers" title="Click to view list of Advanced stocks">Advanced:</a> 1062</p>
</div>
<div class="col-sm-6">
<p class="advDec"><a href="/?pageView=nse-top-losers" title="Click to view list of Declined stocks">Declined:</a> 348</p>
</div>
</div>
</div>
使用我的代码,我能够提取第一个数字 (1062)。但无法提取第二个数字 (348)。你能帮忙吗
假设模式始终相同,您可以 select 通过文本获取元素并获取其 next_sibling
:
adv = soup.select_one('a:-soup-contains("Advanced:")').next_sibling.strip()
dec = soup.select_one('a:-soup-contains("Declined:")').next_sibling.strip()
例子
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
soup = BeautifulSoup(page.content)
adv = soup.select_one('a:-soup-contains("Advanced:")').next_sibling.strip()
dec = soup.select_one('a:-soup-contains("Declined:")').next_sibling.strip()
print(adv, dec)
如果总是有 2 个元素,那么最简单的方法可能是解构所选元素的数组。
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
soup = BeautifulSoup(page.content, "html.parser")
adv, dec = [elm.next_sibling.strip() for elm in soup.select(".advDec a") ]
print("Advanced:", adv)
print("Declined", dec)
我是 Python 的新手,从未接触过 HTML。所以任何帮助将不胜感激。 我需要从网站的检查元素中提取两个数字:“1062”和“348”。 这是我的代码:
page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
soup = BeautifulSoup(page.content, 'html.parser')
Adv = soup.select_one ('.col-sm-6 .advDec:nth-child(1)').text[10:]
Dec = soup.select_two ('.col-sm-6 .advDec:nth-child(2)').text[10:]
网站元素如下所示:
<div class="nifty-header-shade1 col-xs-12 col-sm-6 col-md-3">
<div class="row">
<div class="col-sm-12">
<h4>Stocks</h4>
</div>
<div class="col-sm-6">
<p class="advDec"><a href="/?pageView=nse-top-gainers" title="Click to view list of Advanced stocks">Advanced:</a> 1062</p>
</div>
<div class="col-sm-6">
<p class="advDec"><a href="/?pageView=nse-top-losers" title="Click to view list of Declined stocks">Declined:</a> 348</p>
</div>
</div>
</div>
使用我的代码,我能够提取第一个数字 (1062)。但无法提取第二个数字 (348)。你能帮忙吗
假设模式始终相同,您可以 select 通过文本获取元素并获取其 next_sibling
:
adv = soup.select_one('a:-soup-contains("Advanced:")').next_sibling.strip()
dec = soup.select_one('a:-soup-contains("Declined:")').next_sibling.strip()
例子
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
soup = BeautifulSoup(page.content)
adv = soup.select_one('a:-soup-contains("Advanced:")').next_sibling.strip()
dec = soup.select_one('a:-soup-contains("Declined:")').next_sibling.strip()
print(adv, dec)
如果总是有 2 个元素,那么最简单的方法可能是解构所选元素的数组。
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
soup = BeautifulSoup(page.content, "html.parser")
adv, dec = [elm.next_sibling.strip() for elm in soup.select(".advDec a") ]
print("Advanced:", adv)
print("Declined", dec)