如何使用美丽的汤和 python 抓取卡片详细信息

Question

我正在尝试抓取这个 link : https://www.axisbank.com/retail/cards/credit-card

使用以下代码

from urllib.request import urlopen
from bs4 import BeautifulSoup
import json, requests, re

axis_url = ["https://www.axisbank.com/retail/cards/credit-card"]

html = requests.get(axis_url[0])
soup = BeautifulSoup(html.content, 'lxml')

for d in soup.find_all('span'):
    print(d.get_text())

输出：

close
5.15%
%
4.00%
%
5.40%

基本上我想获取该页面中出现的每张卡片的详细信息

我尝试了不同的标签，但 none 似乎有效。

我很高兴看到满足我要求的代码。

非常感谢任何帮助。

Answer 1

会发生什么？

您的主要问题是，该网站动态提供其内容，您无法实现您的目标，即您请求的目标。打印你的汤看看，它不会包含你在浏览器中检查的元素。

如何修复？

使用selenium 可以处理动态生成的内容，并提供您检查过的信息：

例子

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
url = 'https://www.axisbank.com/retail/cards/credit-card'
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'lxml')
    
driver.close()

textList = []
for d in soup.select('#ulCreditCard li li > span'):
        textList.append(d.get_text('^^', strip=True))
    
textList

如何使用美丽的汤和 python 抓取卡片详细信息

How to scrape card details using beautiful soup and python

python

beautifulsoup

web-scraping

python-requests

scrapinghub

会发生什么？

如何修复？