价格不会出现在 html 解析中
Price won't show up in the html parsing
我正在尝试将此对象的价格放入一个变量中并在 CSV 中打印 out/put。
这是 html 中我试图解析的部分:
<span class="price" data-js-product-price="">
<span>9.00 USD</span>
</span>
这是我的 python 代码(抱歉我对 python 有点陌生,我已经尝试解决这个问题一段时间了,如果代码有点乱,请见谅)
from bs4 import BeautifulSoup
url_to_scrape = "https://www.backfireboards.com/?gclid=CjwKCAjwjtOTBhAvEiwASG4bCGHPgmV4XjyqAIFrW0Lr0IiW0AvfTiC7sZ4E-HtM_qJ9k4ahAu2CHxoCH5YQAvD_BwE"
request_page = urlopen(url_to_scrape)
page_html = request_page.read()
request_page.close()
html_soup = BeautifulSoup(page_html, 'html.parser')
board_prices = html_soup.find_all('span', class_='price')
print("num of prices: " + str(len(board_prices)))
file_name = 'product.csv'
f = open(file_name,'w')
headers = 'Title, Price \n'
f.write(headers)
i = 1
for price in board_prices:
currPrice = price.span.text
print(i)
i = i + 1
print(price)
print(currPrice)
f.close()
这是我遇到的错误:
Traceback (most recent call last):
File "/Users/isaiah/PycharmProjects/Web_scrape/main.py", line 26, in <module>
currPrice = price.span.text
AttributeError: 'NoneType' object has no attribute 'text'
我知道这不是一个文本对象,但是当我在没有 .text 的情况下打印它时,它会吐出这个:
1
<span class="price" data-js-popup-cart-subtotal=""></span>
None
2
<span class="price" data-js-product-price=""><span></span></span>
<span></span>
3
<span class="price" data-js-product-price=""><span></span></span>
<span></span>
4
<span class="price"><span></span></span>
<span></span>
5
<span class="price" data-js-product-price=""><span></span></span>
<span></span>
6
<span class="price" data-js-product-price=""><span></span></span>
<span></span>
我不知道为什么 429 美元在对象中消失了,我对网络抓取等有点陌生。有没有很简单的东西我也懵了?
此外,据我所知,该网站实际上在此页面上列出了 8 个价格,但 board_prices 对象只有 6 号?有人也可以给我解释一下吗?
您在页面上看到的数据是从外部源加载的。要获得 titles/prices 个电动滑板,您可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url = "https://www.backfireboards.com/collections/electric-skateboards"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for h4 in soup.select(".product-collection__content h4"):
title = h4.get_text(strip=True)
price = h4.find_next(class_="price").contents[-1].text
print("{:<15} {}".format(price, title))
打印:
9.00 USD Backfire G2 Black with Super Power Hobbywing Motors and 96mm Wheels with 180 Days Warranty Especially Suitable for Beginners
9.00 USD Backfire G3 with Super Flexible Deck
9.00 USD Backfire Zealot S Belt Drive Electric Skateboard
,899.00 USD Backfire Hammer Sledge
,199.00 USD Backfire Hammer Belt Drive All Terrain Electric Skateboard
9.00 USD Backfire Zealot Belt Drive Electric Skateboard
,399.00 USD Backfire Ranger X3 All Terrain Electric Skateboard with 1500W X2 Ultra High Power Ultra High Torque Motors and 12S High Voltage High Efficiency Electronic System
9.00 USD Backfire Ranger X2 All Terrain Electric Skateboard with 1200W X2 Ultra High Power Ultra High Torque Motors and 12S High Voltage High Efficiency Electronic System
9.00 USD Backfire Mini Super Portable Electric Skateboard Best for City Commute
9.00 USD Backfire G3 Plus with Carbon Fiber Deck and Ultra Long Range
9.00 USD Backfire ERA Electric Skateboard
我正在尝试将此对象的价格放入一个变量中并在 CSV 中打印 out/put。
这是 html 中我试图解析的部分:
<span class="price" data-js-product-price="">
<span>9.00 USD</span>
</span>
这是我的 python 代码(抱歉我对 python 有点陌生,我已经尝试解决这个问题一段时间了,如果代码有点乱,请见谅)
from bs4 import BeautifulSoup
url_to_scrape = "https://www.backfireboards.com/?gclid=CjwKCAjwjtOTBhAvEiwASG4bCGHPgmV4XjyqAIFrW0Lr0IiW0AvfTiC7sZ4E-HtM_qJ9k4ahAu2CHxoCH5YQAvD_BwE"
request_page = urlopen(url_to_scrape)
page_html = request_page.read()
request_page.close()
html_soup = BeautifulSoup(page_html, 'html.parser')
board_prices = html_soup.find_all('span', class_='price')
print("num of prices: " + str(len(board_prices)))
file_name = 'product.csv'
f = open(file_name,'w')
headers = 'Title, Price \n'
f.write(headers)
i = 1
for price in board_prices:
currPrice = price.span.text
print(i)
i = i + 1
print(price)
print(currPrice)
f.close()
这是我遇到的错误:
Traceback (most recent call last):
File "/Users/isaiah/PycharmProjects/Web_scrape/main.py", line 26, in <module>
currPrice = price.span.text
AttributeError: 'NoneType' object has no attribute 'text'
我知道这不是一个文本对象,但是当我在没有 .text 的情况下打印它时,它会吐出这个:
1
<span class="price" data-js-popup-cart-subtotal=""></span>
None
2
<span class="price" data-js-product-price=""><span></span></span>
<span></span>
3
<span class="price" data-js-product-price=""><span></span></span>
<span></span>
4
<span class="price"><span></span></span>
<span></span>
5
<span class="price" data-js-product-price=""><span></span></span>
<span></span>
6
<span class="price" data-js-product-price=""><span></span></span>
<span></span>
我不知道为什么 429 美元在对象中消失了,我对网络抓取等有点陌生。有没有很简单的东西我也懵了?
此外,据我所知,该网站实际上在此页面上列出了 8 个价格,但 board_prices 对象只有 6 号?有人也可以给我解释一下吗?
您在页面上看到的数据是从外部源加载的。要获得 titles/prices 个电动滑板,您可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url = "https://www.backfireboards.com/collections/electric-skateboards"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for h4 in soup.select(".product-collection__content h4"):
title = h4.get_text(strip=True)
price = h4.find_next(class_="price").contents[-1].text
print("{:<15} {}".format(price, title))
打印:
9.00 USD Backfire G2 Black with Super Power Hobbywing Motors and 96mm Wheels with 180 Days Warranty Especially Suitable for Beginners
9.00 USD Backfire G3 with Super Flexible Deck
9.00 USD Backfire Zealot S Belt Drive Electric Skateboard
,899.00 USD Backfire Hammer Sledge
,199.00 USD Backfire Hammer Belt Drive All Terrain Electric Skateboard
9.00 USD Backfire Zealot Belt Drive Electric Skateboard
,399.00 USD Backfire Ranger X3 All Terrain Electric Skateboard with 1500W X2 Ultra High Power Ultra High Torque Motors and 12S High Voltage High Efficiency Electronic System
9.00 USD Backfire Ranger X2 All Terrain Electric Skateboard with 1200W X2 Ultra High Power Ultra High Torque Motors and 12S High Voltage High Efficiency Electronic System
9.00 USD Backfire Mini Super Portable Electric Skateboard Best for City Commute
9.00 USD Backfire G3 Plus with Carbon Fiber Deck and Ultra Long Range
9.00 USD Backfire ERA Electric Skateboard