无法使用 selenium、beautifulsoup 和 python 抓取卡片的详细信息
Not able to scrape the details of the card using selenium, beautifulsoup and python
实际问题:
我想要的link物品在下面的图片中:
我写了下面的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json, requests, re
from selenium import webdriver
s = 'https://www.axisbank.com/retail/cards/credit-card/axis-bank-ace-credit-card/features-benefits#menuTab'
driver = webdriver.Chrome(executable_path="C:\Users\Hari\Downloads\chromedriver.exe")
driver.get(s)
soup = BeautifulSoup(driver.page_source, 'lxml')
# print(x.find('h3').get_text())
det = []
a = soup.find('div', class_ = 'owl-stage')
for x in a.find_all('div', class_ = 'owl-item'):
print(x.find('li').get_text())
driver.close()
我尝试了上面的代码,但在得到这个输出后卡住了
输出
Traceback (most recent call last):
File "C:\Users\Hari\PycharmProjects\Card_Prj\buffer.py", line 22, in <module>
print(x.find('li').get_text())
AttributeError: 'NoneType' object has no attribute 'get_text'
我不知道如何进一步进行并抓取我想要的信息,非常感谢任何帮助。
编辑
如评论中所述,预期输出是另一个,应添加到问题中。无论如何,为了实现您的目标,请像这样提取标题和描述:
for x in soup.select('div.owl-stage div.owl-item'):
heading = x.h3.get_text(strip=True)
description = x.select_one('h3 + div').get_text(strip=True)
det.append(heading+':'+description)
使用 and python you have to induce for visibility_of_all_elements_located()
and you can use either of the following 提取功能和优势部分中的可见文本:
使用CSS_SELECTOR
:
driver.get("https://www.axisbank.com/retail/cards/credit-card/axis-bank-ace-credit-card/features-benefits#menuTab")
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.owl-item.active div.contentBox")))])
使用XPATH
:
driver.get("https://www.axisbank.com/retail/cards/credit-card/axis-bank-ace-credit-card/features-benefits#menuTab")
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='owl-item active']//div[@class='contentBox']")))])
控制台输出:
['Launch offer\n5% cashback on Big Basket and Grofers\nValid till 28th February 2021\nFor detailed terms and conditions, click here', 'Unlimited Cashback on every spend\n5% cashback on bill payments (electricity, internet, gas and more) DTH and mobile recharges on Google Pay\n4% on Swiggy, Zomato & Ola\n2% on all other spends\nNo upper limit on cashback\n\nRead More', 'Lounge Access\nEnjoy 4 complimentary lounge visits per calendar year at select domestic airports with your ACE Credit Card. For list of airports and detailed terms and conditions, click here']
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
实际问题:
我想要的link物品在下面的图片中:
我写了下面的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json, requests, re
from selenium import webdriver
s = 'https://www.axisbank.com/retail/cards/credit-card/axis-bank-ace-credit-card/features-benefits#menuTab'
driver = webdriver.Chrome(executable_path="C:\Users\Hari\Downloads\chromedriver.exe")
driver.get(s)
soup = BeautifulSoup(driver.page_source, 'lxml')
# print(x.find('h3').get_text())
det = []
a = soup.find('div', class_ = 'owl-stage')
for x in a.find_all('div', class_ = 'owl-item'):
print(x.find('li').get_text())
driver.close()
我尝试了上面的代码,但在得到这个输出后卡住了
输出
Traceback (most recent call last):
File "C:\Users\Hari\PycharmProjects\Card_Prj\buffer.py", line 22, in <module>
print(x.find('li').get_text())
AttributeError: 'NoneType' object has no attribute 'get_text'
我不知道如何进一步进行并抓取我想要的信息,非常感谢任何帮助。
编辑
如评论中所述,预期输出是另一个,应添加到问题中。无论如何,为了实现您的目标,请像这样提取标题和描述:
for x in soup.select('div.owl-stage div.owl-item'):
heading = x.h3.get_text(strip=True)
description = x.select_one('h3 + div').get_text(strip=True)
det.append(heading+':'+description)
使用visibility_of_all_elements_located()
and you can use either of the following
使用
CSS_SELECTOR
:driver.get("https://www.axisbank.com/retail/cards/credit-card/axis-bank-ace-credit-card/features-benefits#menuTab") print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.owl-item.active div.contentBox")))])
使用
XPATH
:driver.get("https://www.axisbank.com/retail/cards/credit-card/axis-bank-ace-credit-card/features-benefits#menuTab") print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='owl-item active']//div[@class='contentBox']")))])
控制台输出:
['Launch offer\n5% cashback on Big Basket and Grofers\nValid till 28th February 2021\nFor detailed terms and conditions, click here', 'Unlimited Cashback on every spend\n5% cashback on bill payments (electricity, internet, gas and more) DTH and mobile recharges on Google Pay\n4% on Swiggy, Zomato & Ola\n2% on all other spends\nNo upper limit on cashback\n\nRead More', 'Lounge Access\nEnjoy 4 complimentary lounge visits per calendar year at select domestic airports with your ACE Credit Card. For list of airports and detailed terms and conditions, click here']
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC