当文本不在 HTML 元素中时如何抓取文本
How to scrape a text when it'ss not in a HTML element
我想抓取这个网站:https://www.hectorjones.co.nz/milwaukee-hand-tools-and-accessories.html
我想抓取 Product Sku、Price、List Price 元素。
我设法抓取了 Price,但我在其他两个方面遇到了问题,尤其是 Product Sku,因为它不在 span 中。就在一个div里,能不能抓取呢?如果可以,你能帮帮我吗
如您所见,产品 Sku 没有跨度。
<div class="vm3pr-2"> <div class="product-price" id="productPrice1499">
<div class="product-sku"><span class="bold">Product SKU</span> : 2203-20<br></div>
这里是更多的代码。
<div class="vm3pr-2"> <div class="product-price" id="productPrice1499">
<div class="product-sku"><span class="bold">Product SKU</span> : 2203-20<br></div>
<div class="PricesalesPrice vm-display vm-price-value"><span class="vm-price-desc">Price (inc GST):
</span><span class="PricesalesPrice">.00</span></div><span class="ex-tax"></span><div
class="PricediscountAmount vm-nodisplay"><span class="vm-price-desc">Discount: </span><span
class="PricediscountAmount"></span></div></div>
<div class="clear"></div>
</div>
这是我的代码
prices = driver.find_elements_by_class_name("PricesalesPrice")
sku = driver.find_elements_by_class_name("bold")
list_price = driver.find_elements_by_class_name("PricebasePriceWithTax")
for price in prices:
print(price.text)
在这里您可以执行此操作以获取具有相应价格的产品-sku。我在用美汤刮痧....
views.py
import requests
from bs4 import BeautifulSoup
from django.shortcuts import render
base_url = 'https://www.hectorjones.co.nz/milwaukee-hand-tools-and-accessories.html'
def home(request):
response = requests.get(base_url)
data = response.text
soup = BeautifulSoup(data, features='html.parser')
post_listings = soup.find_all('div', {'class': 'product-price'})
final_postings = []
for post in post_listings:
product_sku = post.find('div', {'class': 'product-sku'}).text
price = post.find('span', {'class': 'PricesalesPrice'}).text
final_postings.append((product_sku, price))
context = {
'final_postings': final_postings,
}
return render(request, 'display.html', context)
display.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>hectorjones.co.nz/</title>
</head>
<body>
{% for post in final_postings %}
<ul>
<li><p>{{ post.0 }}<br> Price : {{ post.1 }}
</p></li>
</ul>
{% endfor %}
</body>
</html>
嗨,Burak 之前编写的代码是在 django 中。正如您所问,这是您在 cmd 中 运行 所需要的代码,它将打印该网站上可用的产品列表。
首先确保安装这两个 python 软件包:
pip 安装请求
pip 安装 bs4
scraping.py
import requests
from bs4 import BeautifulSoup
base_url = 'https://www.hectorjones.co.nz/milwaukee-hand-tools-and-accessories.html'
response = requests.get(base_url)
data = response.text
soup = BeautifulSoup(data, features='html.parser')
post_listings = soup.find_all('div', {'class': 'product-price'})
final_postings = []
for post in post_listings:
product_sku = post.find('div', {'class': 'product-sku'}).text
price = post.find('span', {'class': 'PricesalesPrice'}).text
final_postings.append((product_sku, price))
print(final_postings)
如果您对任何步骤感到困惑,请告诉我。
快乐编码
我想抓取这个网站:https://www.hectorjones.co.nz/milwaukee-hand-tools-and-accessories.html 我想抓取 Product Sku、Price、List Price 元素。 我设法抓取了 Price,但我在其他两个方面遇到了问题,尤其是 Product Sku,因为它不在 span 中。就在一个div里,能不能抓取呢?如果可以,你能帮帮我吗
如您所见,产品 Sku 没有跨度。
<div class="vm3pr-2"> <div class="product-price" id="productPrice1499">
<div class="product-sku"><span class="bold">Product SKU</span> : 2203-20<br></div>
这里是更多的代码。
<div class="vm3pr-2"> <div class="product-price" id="productPrice1499">
<div class="product-sku"><span class="bold">Product SKU</span> : 2203-20<br></div>
<div class="PricesalesPrice vm-display vm-price-value"><span class="vm-price-desc">Price (inc GST):
</span><span class="PricesalesPrice">.00</span></div><span class="ex-tax"></span><div
class="PricediscountAmount vm-nodisplay"><span class="vm-price-desc">Discount: </span><span
class="PricediscountAmount"></span></div></div>
<div class="clear"></div>
</div>
这是我的代码
prices = driver.find_elements_by_class_name("PricesalesPrice")
sku = driver.find_elements_by_class_name("bold")
list_price = driver.find_elements_by_class_name("PricebasePriceWithTax")
for price in prices:
print(price.text)
在这里您可以执行此操作以获取具有相应价格的产品-sku。我在用美汤刮痧....
views.py
import requests
from bs4 import BeautifulSoup
from django.shortcuts import render
base_url = 'https://www.hectorjones.co.nz/milwaukee-hand-tools-and-accessories.html'
def home(request):
response = requests.get(base_url)
data = response.text
soup = BeautifulSoup(data, features='html.parser')
post_listings = soup.find_all('div', {'class': 'product-price'})
final_postings = []
for post in post_listings:
product_sku = post.find('div', {'class': 'product-sku'}).text
price = post.find('span', {'class': 'PricesalesPrice'}).text
final_postings.append((product_sku, price))
context = {
'final_postings': final_postings,
}
return render(request, 'display.html', context)
display.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>hectorjones.co.nz/</title>
</head>
<body>
{% for post in final_postings %}
<ul>
<li><p>{{ post.0 }}<br> Price : {{ post.1 }}
</p></li>
</ul>
{% endfor %}
</body>
</html>
嗨,Burak 之前编写的代码是在 django 中。正如您所问,这是您在 cmd 中 运行 所需要的代码,它将打印该网站上可用的产品列表。
首先确保安装这两个 python 软件包:
pip 安装请求
pip 安装 bs4
scraping.py
import requests
from bs4 import BeautifulSoup
base_url = 'https://www.hectorjones.co.nz/milwaukee-hand-tools-and-accessories.html'
response = requests.get(base_url)
data = response.text
soup = BeautifulSoup(data, features='html.parser')
post_listings = soup.find_all('div', {'class': 'product-price'})
final_postings = []
for post in post_listings:
product_sku = post.find('div', {'class': 'product-sku'}).text
price = post.find('span', {'class': 'PricesalesPrice'}).text
final_postings.append((product_sku, price))
print(final_postings)
如果您对任何步骤感到困惑,请告诉我。 快乐编码