Python 网络抓取工具，BeautifulSoup 我的 link 有问题，link 现在要成为头条新闻但重定向到档案页面

Question

link 将我重定向到包含其他头条新闻 https://www.coindesk.com/news/babel-finance-bets-on-longtime-fintech-hand-to-help-navigate-regulatory-landscape 的档案页面。 .com 和 babel 之间 link 上的标签 news 不应存在，因为我相信这是将新闻标题重定向到另一个页面的原因。

from bs4 import BeautifulSoup
import requests


base_url ='https://www.coindesk.com/news'

source = requests.get(base_url).text

soup = BeautifulSoup(source, "html.parser")       
    
    
articles = soup.find_all(class_ = 'list-item-card post')
    
#print(len(articles))
#print(articles) 

    
for article in articles:
      
    headline = article.h4.text.strip()
    link = base_url + article.find_all("a")[1]["href"]
    text = article.find(class_="card-text").text.strip()
    img_url = base_url+article.picture.img['src']
            
    print(headline)
    print(link)
    print(text)
    print("Image " + img_url)
    ```

Answer 1

发生错误是因为您将基础 link（已经包含 /news/）连接到绝对 url

要防止这种情况，您可以使用 urllib.parse.urljoin()

在您的示例中，这应该可以解决问题：

from urllib.parse import urljoin

link = urljoin(base_url, article.find_all("a")[1]["href"])

Python 网络抓取工具，BeautifulSoup 我的 link 有问题，link 现在要成为头条新闻但重定向到档案页面

Python web scraper, with BeautifulSoup I am having problem with my link , the link is now going to headline story but redirecting to the archives page

python

beautifulsoup

hyperlink

web-scraping

imageurl