Python Webscraping - AttributeError: 'NoneType' object has no attribute 'text'

Python Webscraping - AttributeError: 'NoneType' object has no attribute 'text'

我需要一些帮助来尝试使用 BeautifulSoup、Selenium 和 Pandas 将笔记本电脑的价格、评级和产品从 Flipkart 网络抓取到 CSV 文件。问题是我收到错误 AttributeError: 'NoneType' object has no attribute 'text' 当我尝试将抓取的项目附加到空列表中时。

from selenium import webdriver
import pandas as pd
from bs4 import BeautifulSoup


chrome_option = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path = "C:/Users/folder/PycharmProjects/chromedriver.exe")
#flipkart website
driver.get("https://www.flipkart.com/laptops/~cs-g5q3mw47a4/pr?sid=6bo%2Cb5g&collection-tab-name=Browsing&wid=13.productCard.PMU_V2_7")


products = []
prices = []
ratings = []


content = driver.page_source
soup = BeautifulSoup(content, 'lxml')
for item in soup.findAll('a', href = True, attrs={'class' : '_1fQZEK'}):
    name = item.find('div', attrs={'class' : '_4rR01T'})
    price = item.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})
    rating = item.find('div', attrs={'class' : '_3LWZlK'})
    
    products.append(name.text)
    prices.append(price.text)
    ratings.append(rating.text)
    

    df = pd.DataFrame({'Product Name': products,
                        'Price': prices,
                        'Rating': ratings})

    df.to_csv(r"C:\Users\folder\Desktop\webscrape.csv", index=True, encoding= 'utf-8')

您应该使用 .contents.get_text() 而不是 .text。另外,尝试关心 NoneType :

products.append(name.get_text()) if name else ''
prices.append(price.get_text()) if price else ''
ratings.append(rating.get_text()) if ratings else ''

已找到解决方案!将 .text 替换为 .get_text() 后错误解决。另外避免另一个错误 ValueError: arrays must all be same length 的方法是打印(len())以确认附加的长度是否要传递到 Pandas 数据帧的数据。

在这种情况下,发现 ratings 变量在 for 循环的所有迭代中的 len() 均为 0,因此它们未包含在数据帧 df 中。下面是修改后的代码:

#--snip--
    
#empty list to be appended later with webscraped items
products = []
prices = []
ratings = []

for item in soup.findAll('a', href = True, attrs={'class' : '_1fQZEK'}):
    name = item.find('div', attrs={'class' : '_4rR01T'})
    price = item.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})
    rating = item.find('div', attrs={'class' : '_3LWZlK'})
    #append the info to the empty lists

    products.append(name.get_text()) if name else ''
    prices.append(price.get_text()) if price else ''

    #creating pandas DataFrame
    print(f"Products: {len(products)}")
    print(f"Prices: {len(prices)}")
    print(f"Ratings: {len(ratings)}")

    df = pd.DataFrame({'Product Name': products,
                        'Price': prices})
     #sending the dataframe to csv
    df.to_csv(r"C:\Users\folder\Desktop\samplescrape.csv", index=True, encoding= 'utf-8')