Python/ 非分类项目的 Beautiful Soap 数据提取问题
Python/ Beautiful Soap Data Extract Issue for Non Classed Items
我正在尝试从网站中提取一些数据。但是网站的来源并没有每一项都有classes。我需要产品的价格数量和尺寸。
你能指导我找到解决问题的方法吗?
虽然我可以使用滚动菜单为每个 products.Because 提取数据,这是我在页面源代码中看到的唯一 class。总而言之,我需要获取名为 data-comprice data-quantity 和 data-size[=24] 的数据=].但是还没有找到解决办法。我正在分享我的基本代码和源页面的一部分。
提前致谢!
来源:
<div class="scrollmenu">
<div data-value="2' x 3'" class="swatch-element 2-x-3 soldout ">
<input data-comprice="75.01" data-curprice="30.00" data-size="2' x 3'" data-quantity="0" data-sku="AAAA0536-EPERNAY-23" data-price="30.00" data-title="2' x 3'" type="radio" name="id" value="31781284839506" id="radio_31781284839506"/>
<label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;" for="radio_31781284839506">
<p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2' x 3'</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> .01 </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .00 </p>
</label>
</div>
<div data-value="2'7" x 7'3"" class="swatch-element 27-x-73 soldout ">
<input data-comprice="134.81" data-curprice="53.92" data-size="2'7" x 7'3"" data-quantity="0" data-sku="AAAA0536-EPERNAY-2773" data-price="53.92" data-title="2'7" x 7'3"" type="radio" name="id" value="31781284872274" id="radio_31781284872274"/>
<label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;" for="radio_31781284872274">
<p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2'7" x 7'3"</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> 4.81 </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .92 </p>
</label>
</div>
我的初始代码块:
from bs4 import BeautifulSoup
import requests
import pandas as pd
webpage = requests.get('https://markandday.com/products/epernay-cottage-denim-rug')
sp = BeautifulSoup(webpage.content, 'html.parser')
for datapage in sp.find('div',attrs={'class':'scrollmenu'}):
Result=print (datapage)
type(Result)
您可以在 input
标签上使用 find_all
方法从标签获取属性,为此使用 .get()
方法
from bs4 import BeautifulSoup
html=""" <div class="scrollmenu">
<div data-value="2' x 3'" class="swatch-element 2-x-3 soldout ">
<input data-comprice="75.01" data-curprice="30.00" data-size="2' x 3'" data-quantity="0" data-sku="AAAA0536-EPERNAY-23" data-price="30.00" data-title="2' x 3'" type="radio" name="id" value="31781284839506" id="radio_31781284839506"/>
<label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;" for="radio_31781284839506">
<p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2' x 3'</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> .01 </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .00 </p>
</label>
</div>
<div data-value="2'7" x 7'3"" class="swatch-element 27-x-73 soldout ">
<input data-comprice="134.81" data-curprice="53.92" data-size="2'7" x 7'3"" data-quantity="0" data-sku="AAAA0536-EPERNAY-2773" data-price="53.92" data-title="2'7" x 7'3"" type="radio" name="id" value="31781284872274" id="radio_31781284872274"/>
<label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;" for="radio_31781284872274">
<p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2'7" x 7'3"</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> 4.81 </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .92 </p>
</label>
</div>
"""
soup=BeautifulSoup(html,"html.parser")
inps=soup.find("div",class_="scrollmenu").find_all("input")
for inp in inps:
print(inp)
# inp['data-comprice'] you can also use this
print(inp.get("data-comprice"))
print(inp.get("data-curprice"))
print(inp.get("data-quantity"))
print(inp.get("data-size"))
输出:
<input data-comprice="75.01" data-curprice="30.00" data-price="30.00" data-quantity="0" data-size="2' x 3'" data-sku="AAAA0536-EPERNAY-23" data-title="2' x 3'" id="radio_31781284839506" name="id" type="radio" value="31781284839506"/>
75.01
30.00
0
2' x 3'
<input 7'3""="" data-comprice="134.81" data-curprice="53.92" data-price="53.92" data-quantity="0" data-size="2'7" data-sku="AAAA0536-EPERNAY-2773" data-title="2'7" x 7'3"" id="radio_31781284872274" name="id" type="radio" value="31781284872274" x=""/>
134.81
53.92
0
2'7
对于网站:
from bs4 import BeautifulSoup
import requests
html = requests.get('https://markandday.com/products/epernay-cottage-denim-rug')
soup=BeautifulSoup(html.text,"html.parser")
inps=soup.find("div",class_="scrollmenu").find_all("input")
for inp in inps:
print(inp.get("data-comprice"))
print(inp.get("data-curprice"))
print(inp.get("data-quantity"))
print(inp.get("data-size"))
我正在尝试从网站中提取一些数据。但是网站的来源并没有每一项都有classes。我需要产品的价格数量和尺寸。 你能指导我找到解决问题的方法吗?
虽然我可以使用滚动菜单为每个 products.Because 提取数据,这是我在页面源代码中看到的唯一 class。总而言之,我需要获取名为 data-comprice data-quantity 和 data-size[=24] 的数据=].但是还没有找到解决办法。我正在分享我的基本代码和源页面的一部分。 提前致谢!
来源:
<div class="scrollmenu">
<div data-value="2' x 3'" class="swatch-element 2-x-3 soldout ">
<input data-comprice="75.01" data-curprice="30.00" data-size="2' x 3'" data-quantity="0" data-sku="AAAA0536-EPERNAY-23" data-price="30.00" data-title="2' x 3'" type="radio" name="id" value="31781284839506" id="radio_31781284839506"/>
<label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;" for="radio_31781284839506">
<p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2' x 3'</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> .01 </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .00 </p>
</label>
</div>
<div data-value="2'7" x 7'3"" class="swatch-element 27-x-73 soldout ">
<input data-comprice="134.81" data-curprice="53.92" data-size="2'7" x 7'3"" data-quantity="0" data-sku="AAAA0536-EPERNAY-2773" data-price="53.92" data-title="2'7" x 7'3"" type="radio" name="id" value="31781284872274" id="radio_31781284872274"/>
<label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;" for="radio_31781284872274">
<p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2'7" x 7'3"</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> 4.81 </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .92 </p>
</label>
</div>
我的初始代码块:
from bs4 import BeautifulSoup
import requests
import pandas as pd
webpage = requests.get('https://markandday.com/products/epernay-cottage-denim-rug')
sp = BeautifulSoup(webpage.content, 'html.parser')
for datapage in sp.find('div',attrs={'class':'scrollmenu'}):
Result=print (datapage)
type(Result)
您可以在 input
标签上使用 find_all
方法从标签获取属性,为此使用 .get()
方法
from bs4 import BeautifulSoup
html=""" <div class="scrollmenu">
<div data-value="2' x 3'" class="swatch-element 2-x-3 soldout ">
<input data-comprice="75.01" data-curprice="30.00" data-size="2' x 3'" data-quantity="0" data-sku="AAAA0536-EPERNAY-23" data-price="30.00" data-title="2' x 3'" type="radio" name="id" value="31781284839506" id="radio_31781284839506"/>
<label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;" for="radio_31781284839506">
<p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2' x 3'</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> .01 </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .00 </p>
</label>
</div>
<div data-value="2'7" x 7'3"" class="swatch-element 27-x-73 soldout ">
<input data-comprice="134.81" data-curprice="53.92" data-size="2'7" x 7'3"" data-quantity="0" data-sku="AAAA0536-EPERNAY-2773" data-price="53.92" data-title="2'7" x 7'3"" type="radio" name="id" value="31781284872274" id="radio_31781284872274"/>
<label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;" for="radio_31781284872274">
<p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2'7" x 7'3"</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> 4.81 </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .92 </p>
</label>
</div>
"""
soup=BeautifulSoup(html,"html.parser")
inps=soup.find("div",class_="scrollmenu").find_all("input")
for inp in inps:
print(inp)
# inp['data-comprice'] you can also use this
print(inp.get("data-comprice"))
print(inp.get("data-curprice"))
print(inp.get("data-quantity"))
print(inp.get("data-size"))
输出:
<input data-comprice="75.01" data-curprice="30.00" data-price="30.00" data-quantity="0" data-size="2' x 3'" data-sku="AAAA0536-EPERNAY-23" data-title="2' x 3'" id="radio_31781284839506" name="id" type="radio" value="31781284839506"/>
75.01
30.00
0
2' x 3'
<input 7'3""="" data-comprice="134.81" data-curprice="53.92" data-price="53.92" data-quantity="0" data-size="2'7" data-sku="AAAA0536-EPERNAY-2773" data-title="2'7" x 7'3"" id="radio_31781284872274" name="id" type="radio" value="31781284872274" x=""/>
134.81
53.92
0
2'7
对于网站:
from bs4 import BeautifulSoup
import requests
html = requests.get('https://markandday.com/products/epernay-cottage-denim-rug')
soup=BeautifulSoup(html.text,"html.parser")
inps=soup.find("div",class_="scrollmenu").find_all("input")
for inp in inps:
print(inp.get("data-comprice"))
print(inp.get("data-curprice"))
print(inp.get("data-quantity"))
print(inp.get("data-size"))