从BeautifulSoup中提取后提取特定链接

Extract specific links after extracting from BeautifulSoup

我之前使用BeautifulSoup4在网页中提取了一些信息:https://www.peakbagger.com/list.aspx?lid=5651

我得到了一个 href 列表:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.peakbagger.com/list.aspx?lid=5651'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

a= soup.select("a:nth-of-type(1)")
a

但我只想要链接从 'peak.aspx?pid=10...'

开始的那个

如何只打印出带有'peak.aspx?pid=10...'的,需要循环还是拆分?

谢谢。

一种方法是遍历您的选择并只选择包含字符串 peak.aspx?pid=:

的链接
[x['href'] for x in soup.select('a') if 'peak.aspx?pid=' in str(x)]

但您也可以指定 selector 以获得结果 - 这将只为您提供 table 的第二列及其标签:

soup.select('table.gray  tr td:nth-of-type(2) a')

要获取链接,您必须循环遍历结果:

[x['href'] for x in soup.select('table.gray  tr td:nth-of-type(2) a')]