如何使用python和美汤抓取标题和描述?
How to scrape the heading and description using python and beautiful soup?
问题概述:
Link : https://www.bobfinancial.com/eterna.jsp
在详情部分:基本上我想要所有积分。
详情:
[ #This is an array of Strings...
"Milestone Rewards: Earn 10,000 bonus reward points on spending ₹ 50,000 within 60 days & 20,000 bonus reward points on spending ₹ 5,00,000 in a year.",
"Fuel Surcharge Waiver*: 1% fuel surcharge waiver at all fuel stations across India on transactions between Rs.400 and Rs.5,000 (Max. Rs. 250 per statement cycle). Note -No Reward Points are earned on fuel transactions.",
"Core Reward Points: 3 reward points for every ₹ 100 spent on any other category.",
"Redeem reward points for cashback: Redeem your reward points as cashback and other exciting options.All your accumulated reward points can be redeemed for cashback @ 1 reward point = ₹ 0.25",
"In-built insurance cover: Get free Personal Accidental Death Cover to ensure financial protection of your family (Air: 1 Crs, Non-Air: 10 Lakhs) ",
"Zero liability on lost card. Report loss of card immediately to ensure zero liability on any fraudulent transactions",
"Easy EMI option. Convert purchase of > 2,500/- on your card into easy EMIs of 6/12 months"
]
实际问题:
我想要的 link 物品在下面的图片中:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json,requests
details = []
url = ['https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM']
html = urlopen(url[0])
soup = BeautifulSoup(html, 'lxml')
addr= soup.find_all('span',class_ = 'm-bottom-0 header-4 font-weight-bold display-text')
print(addr)
我写了上面的代码,得到这个输出后卡住了:
[]
我不知道如何进一步进行并抓取我想要的信息,非常感谢任何帮助。
试试这个,希望对你有帮助
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json,requests
details = []
url = ['https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM']
html = requests.get(url[0])
print(html.status_code)
soup = BeautifulSoup(html.content, 'lxml')
x = soup.select('span.m-bottom-0')
addr= soup.select('span.m-bottom-0')[12:20] # number of span
for d in addr:
print(d.get_text())
addr= soup.select('span.m-bottom-0')[58:70]
for d in addr:
print(d.get_text()) # get_text() method for inner tag text
您可以只搜索具有 display-text
class 的标题。 body 就是下面的 <span>
条目。这避免了任何可能会破坏其他类似页面的硬编码偏移量。例如:
from bs4 import BeautifulSoup
import requests
url = ['https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM']
html = requests.get(url[0])
soup = BeautifulSoup(html.content, 'lxml')
data = []
for span in soup.select('span.m-bottom-0.display-text.font-weight-bold'):
data.append([span.get_text(strip=True), span.find_next('span').get_text(strip=True)])
print(data)
这将为您提供一个包含 header 的数据结构和如下描述:
[
['Citi Rewards Credit Card', 'Make your shopping more rewarding'],
['Make your shopping more rewarding', 'Get up to 2500 welcome reward points*'],
['Accelerated rewards', 'Earn minimum 1 reward point for every 125 on all purchases.\nEarn 10X reward points at online and physical department and apparel stores.'],
['Bonus rewards', 'Get 300 bonus points on card purchase of INR 30,000 or more in a month'],
['Evergreen reward points', "Redeem now or keep collecting – it's a choice you have because your reward points will never expire."],
['Tap n Pay', 'Now pay the easy way by enabling contactless payments on your Citi credit card.Click hereto see how.'],
['Rewards', "Redeem rewards: Redeem now or keep collecting – it's a choice you have.Click hereto see how"],
['Travel and Lifestyle Services', 'Contact the Travel and Lifestyle specialist to create and plan an experience that will help you enjoy the best in life. Simply call 0008 004 407 027 for Mastercard®cardholders or 1800-114-999 for Visa cardholders.']
]
您可以使用 print(data[2:])
跳过前两个结果(如果不需要)
可以扩展为多个网址,header和描述可以组合:
from bs4 import BeautifulSoup
import requests
urls = [
'https://www.online.citibank.co.in/credit-card/fuel-card/citi-indianoil-card?eOfferCode=INCCCCTWAFCTIOPLM',
'https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM',
]
for url in urls:
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')
data = []
for span in soup.select('span.m-bottom-0.display-text.font-weight-bold'):
data.append(f'{span.get_text(strip=True)}: {span.find_next("span").get_text(strip=True)}')
# Display the list of strings
print('\n'.join(data[1:]))
print()
为您提供以下两个 URL 的输出:
Accelerated Rewards on Fuel spends: 4 Turbo points / Rs. 150 spent &1% fuel surcharge reversalon fuel purchase atauthorized IndianOil outlets^
Earn on all Daily Spends: 2 Turbo points / Rs. 150 spent on groceries and supermarkets#1 Turbo point / Rs. 150 on all other spends.#Clickherefor the details
Redeem Instantly: 1 Turbo Point = Re. 1 of free fuelRedeem your Turbo points instantly via SMS for freeFuel atauthorized IndianOil outlets^
Tap n Pay: Now pay the easy way by enabling contactless payments on your Citi credit card.Click hereto see how.
Redeem Rewards!: Choose from a range of redeeming options including fuel, holidays, air miles, cash back and more.Click hereto see how
Dining Privileges: Up to 20% savings across participating restaurants. To find a Citibank partner restaurant near you,click here
Travel and Lifestyle Services: Contact the Travel and Lifestyle specialist to create and plan an experience that will help you enjoy the best in life. Simply call 0008 004 407 027 for Mastercard cardholders or 1800-114-999 for Visa cardholders.
Accelerated rewards: Earn minimum 1 reward point for every 125 on all purchases.
Earn 10X reward points at online and physical department and apparel stores.
Bonus rewards: Get 300 bonus points on card purchase of INR 30,000 or more in a month
Evergreen reward points: Redeem now or keep collecting – it's a choice you have because your reward points will never expire.
Tap n Pay: Now pay the easy way by enabling contactless payments on your Citi credit card.Click hereto see how.
Rewards: Redeem rewards: Redeem now or keep collecting – it's a choice you have.Click hereto see how
Travel and Lifestyle Services: Contact the Travel and Lifestyle specialist to create and plan an experience that will help you enjoy the best in life. Simply call 0008 004 407 027 for Mastercard®cardholders or 1800-114-999 for Visa cardholders.
问题概述:
Link : https://www.bobfinancial.com/eterna.jsp
在详情部分:基本上我想要所有积分。
详情:
[ #This is an array of Strings...
"Milestone Rewards: Earn 10,000 bonus reward points on spending ₹ 50,000 within 60 days & 20,000 bonus reward points on spending ₹ 5,00,000 in a year.",
"Fuel Surcharge Waiver*: 1% fuel surcharge waiver at all fuel stations across India on transactions between Rs.400 and Rs.5,000 (Max. Rs. 250 per statement cycle). Note -No Reward Points are earned on fuel transactions.",
"Core Reward Points: 3 reward points for every ₹ 100 spent on any other category.",
"Redeem reward points for cashback: Redeem your reward points as cashback and other exciting options.All your accumulated reward points can be redeemed for cashback @ 1 reward point = ₹ 0.25",
"In-built insurance cover: Get free Personal Accidental Death Cover to ensure financial protection of your family (Air: 1 Crs, Non-Air: 10 Lakhs) ",
"Zero liability on lost card. Report loss of card immediately to ensure zero liability on any fraudulent transactions",
"Easy EMI option. Convert purchase of > 2,500/- on your card into easy EMIs of 6/12 months"
]
实际问题:
我想要的 link 物品在下面的图片中:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json,requests
details = []
url = ['https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM']
html = urlopen(url[0])
soup = BeautifulSoup(html, 'lxml')
addr= soup.find_all('span',class_ = 'm-bottom-0 header-4 font-weight-bold display-text')
print(addr)
我写了上面的代码,得到这个输出后卡住了:
[]
我不知道如何进一步进行并抓取我想要的信息,非常感谢任何帮助。
试试这个,希望对你有帮助
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json,requests
details = []
url = ['https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM']
html = requests.get(url[0])
print(html.status_code)
soup = BeautifulSoup(html.content, 'lxml')
x = soup.select('span.m-bottom-0')
addr= soup.select('span.m-bottom-0')[12:20] # number of span
for d in addr:
print(d.get_text())
addr= soup.select('span.m-bottom-0')[58:70]
for d in addr:
print(d.get_text()) # get_text() method for inner tag text
您可以只搜索具有 display-text
class 的标题。 body 就是下面的 <span>
条目。这避免了任何可能会破坏其他类似页面的硬编码偏移量。例如:
from bs4 import BeautifulSoup
import requests
url = ['https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM']
html = requests.get(url[0])
soup = BeautifulSoup(html.content, 'lxml')
data = []
for span in soup.select('span.m-bottom-0.display-text.font-weight-bold'):
data.append([span.get_text(strip=True), span.find_next('span').get_text(strip=True)])
print(data)
这将为您提供一个包含 header 的数据结构和如下描述:
[
['Citi Rewards Credit Card', 'Make your shopping more rewarding'],
['Make your shopping more rewarding', 'Get up to 2500 welcome reward points*'],
['Accelerated rewards', 'Earn minimum 1 reward point for every 125 on all purchases.\nEarn 10X reward points at online and physical department and apparel stores.'],
['Bonus rewards', 'Get 300 bonus points on card purchase of INR 30,000 or more in a month'],
['Evergreen reward points', "Redeem now or keep collecting – it's a choice you have because your reward points will never expire."],
['Tap n Pay', 'Now pay the easy way by enabling contactless payments on your Citi credit card.Click hereto see how.'],
['Rewards', "Redeem rewards: Redeem now or keep collecting – it's a choice you have.Click hereto see how"],
['Travel and Lifestyle Services', 'Contact the Travel and Lifestyle specialist to create and plan an experience that will help you enjoy the best in life. Simply call 0008 004 407 027 for Mastercard®cardholders or 1800-114-999 for Visa cardholders.']
]
您可以使用 print(data[2:])
可以扩展为多个网址,header和描述可以组合:
from bs4 import BeautifulSoup
import requests
urls = [
'https://www.online.citibank.co.in/credit-card/fuel-card/citi-indianoil-card?eOfferCode=INCCCCTWAFCTIOPLM',
'https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM',
]
for url in urls:
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')
data = []
for span in soup.select('span.m-bottom-0.display-text.font-weight-bold'):
data.append(f'{span.get_text(strip=True)}: {span.find_next("span").get_text(strip=True)}')
# Display the list of strings
print('\n'.join(data[1:]))
print()
为您提供以下两个 URL 的输出:
Accelerated Rewards on Fuel spends: 4 Turbo points / Rs. 150 spent &1% fuel surcharge reversalon fuel purchase atauthorized IndianOil outlets^
Earn on all Daily Spends: 2 Turbo points / Rs. 150 spent on groceries and supermarkets#1 Turbo point / Rs. 150 on all other spends.#Clickherefor the details
Redeem Instantly: 1 Turbo Point = Re. 1 of free fuelRedeem your Turbo points instantly via SMS for freeFuel atauthorized IndianOil outlets^
Tap n Pay: Now pay the easy way by enabling contactless payments on your Citi credit card.Click hereto see how.
Redeem Rewards!: Choose from a range of redeeming options including fuel, holidays, air miles, cash back and more.Click hereto see how
Dining Privileges: Up to 20% savings across participating restaurants. To find a Citibank partner restaurant near you,click here
Travel and Lifestyle Services: Contact the Travel and Lifestyle specialist to create and plan an experience that will help you enjoy the best in life. Simply call 0008 004 407 027 for Mastercard cardholders or 1800-114-999 for Visa cardholders.
Accelerated rewards: Earn minimum 1 reward point for every 125 on all purchases.
Earn 10X reward points at online and physical department and apparel stores.
Bonus rewards: Get 300 bonus points on card purchase of INR 30,000 or more in a month
Evergreen reward points: Redeem now or keep collecting – it's a choice you have because your reward points will never expire.
Tap n Pay: Now pay the easy way by enabling contactless payments on your Citi credit card.Click hereto see how.
Rewards: Redeem rewards: Redeem now or keep collecting – it's a choice you have.Click hereto see how
Travel and Lifestyle Services: Contact the Travel and Lifestyle specialist to create and plan an experience that will help you enjoy the best in life. Simply call 0008 004 407 027 for Mastercard®cardholders or 1800-114-999 for Visa cardholders.