Python/ 非分类项目的 Beautiful Soap 数据提取问题

Python/ Beautiful Soap Data Extract Issue for Non Classed Items

我正在尝试从网站中提取一些数据。但是网站的来源并没有每一项都有classes。我需要产品的价格数量和尺寸。 你能指导我找到解决问题的方法吗?

虽然我可以使用滚动菜单为每个 products.Because 提取数据,这是我在页面源代码中看到的唯一 class。总而言之,我需要获取名为 data-comprice data-quantitydata-size[=24] 的数据=].但是还没有找到解决办法。我正在分享我的基本代码和源页面的一部分。 提前致谢!

来源:

 <div class="scrollmenu">
               
               

                
    
  <div data-value="2&#39; x 3&#39;" class="swatch-element 2-x-3 soldout ">
         <input data-comprice="75.01" data-curprice="30.00" data-size="2' x 3'" data-quantity="0" data-sku="AAAA0536-EPERNAY-23" data-price="30.00" data-title="2&#39; x 3&#39;" type="radio" name="id" value="31781284839506" id="radio_31781284839506"/>
        <label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;"  for="radio_31781284839506">
          <p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2' x 3'</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> .01  </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .00 </p>
        </label>
      </div>

 
    
    
    
              
                
    
  <div data-value="2&#39;7&quot; x 7&#39;3&quot;" class="swatch-element 27-x-73 soldout ">
         <input data-comprice="134.81" data-curprice="53.92" data-size="2'7" x 7'3"" data-quantity="0" data-sku="AAAA0536-EPERNAY-2773" data-price="53.92" data-title="2&#39;7&quot; x 7&#39;3&quot;" type="radio" name="id" value="31781284872274" id="radio_31781284872274"/>
        <label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;"  for="radio_31781284872274">
          <p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2'7" x 7'3"</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> 4.81  </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .92 </p>
        </label>
      </div>

 

我的初始代码块:

from bs4 import BeautifulSoup
import requests
import pandas as pd

webpage = requests.get('https://markandday.com/products/epernay-cottage-denim-rug')

sp = BeautifulSoup(webpage.content, 'html.parser')

for datapage in sp.find('div',attrs={'class':'scrollmenu'}):
   
    
 
  Result=print (datapage)
  
  type(Result)

您可以在 input 标签上使用 find_all 方法从标签获取属性,为此使用 .get() 方法

from bs4 import BeautifulSoup
html=""" <div class="scrollmenu">
            
  <div data-value="2&#39; x 3&#39;" class="swatch-element 2-x-3 soldout ">
         <input data-comprice="75.01" data-curprice="30.00" data-size="2' x 3'" data-quantity="0" data-sku="AAAA0536-EPERNAY-23" data-price="30.00" data-title="2&#39; x 3&#39;" type="radio" name="id" value="31781284839506" id="radio_31781284839506"/>
        <label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;"  for="radio_31781284839506">
          <p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2' x 3'</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> .01  </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .00 </p>
        </label>
      </div>
  <div data-value="2&#39;7&quot; x 7&#39;3&quot;" class="swatch-element 27-x-73 soldout ">
         <input data-comprice="134.81" data-curprice="53.92" data-size="2'7" x 7'3"" data-quantity="0" data-sku="AAAA0536-EPERNAY-2773" data-price="53.92" data-title="2&#39;7&quot; x 7&#39;3&quot;" type="radio" name="id" value="31781284872274" id="radio_31781284872274"/>
        <label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;"  for="radio_31781284872274">
          <p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2'7" x 7'3"</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> 4.81  </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> .92 </p>
        </label>
      </div>

 """
soup=BeautifulSoup(html,"html.parser")
inps=soup.find("div",class_="scrollmenu").find_all("input")
for inp in inps:
    print(inp)
    # inp['data-comprice'] you can also use this
    print(inp.get("data-comprice"))
    print(inp.get("data-curprice"))
    print(inp.get("data-quantity"))
    print(inp.get("data-size"))

输出:

<input data-comprice="75.01" data-curprice="30.00" data-price="30.00" data-quantity="0" data-size="2' x 3'" data-sku="AAAA0536-EPERNAY-23" data-title="2' x 3'" id="radio_31781284839506" name="id" type="radio" value="31781284839506"/>
    75.01
    30.00
    0
    2' x 3'
<input 7'3""="" data-comprice="134.81" data-curprice="53.92" data-price="53.92" data-quantity="0" data-size="2'7" data-sku="AAAA0536-EPERNAY-2773" data-title="2'7&quot; x 7'3&quot;" id="radio_31781284872274" name="id" type="radio" value="31781284872274" x=""/>
        134.81
        53.92
        0
        2'7

对于网站:

from bs4 import BeautifulSoup
import requests
html = requests.get('https://markandday.com/products/epernay-cottage-denim-rug')
soup=BeautifulSoup(html.text,"html.parser")
inps=soup.find("div",class_="scrollmenu").find_all("input")
for inp in inps:
    print(inp.get("data-comprice"))
    print(inp.get("data-curprice"))
    print(inp.get("data-quantity"))
    print(inp.get("data-size"))