从BeautifulSoup中提取后提取特定链接

Question

我之前使用BeautifulSoup4在网页中提取了一些信息：https://www.peakbagger.com/list.aspx?lid=5651

我得到了一个 href 列表：

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.peakbagger.com/list.aspx?lid=5651'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

a= soup.select("a:nth-of-type(1)")
a

但我只想要链接从 'peak.aspx?pid=10...'

开始的那个

如何只打印出带有'peak.aspx?pid=10...'的，需要循环还是拆分？

谢谢。

Answer 1

一种方法是遍历您的选择并只选择包含字符串 peak.aspx?pid=:

的链接

[x['href'] for x in soup.select('a') if 'peak.aspx?pid=' in str(x)]

但您也可以指定 selector 以获得结果 - 这将只为您提供 table 的第二列及其标签：

soup.select('table.gray  tr td:nth-of-type(2) a')

要获取链接，您必须循环遍历结果：

[x['href'] for x in soup.select('table.gray  tr td:nth-of-type(2) a')]

从BeautifulSoup中提取后提取特定链接

Extract specific links after extracting from BeautifulSoup

python

beautifulsoup

hyperlink

web