如何仅获取父 div 元素的值并使用 Beautifulsoup 排除剩余的子 div 元素

How to get only value of parent div element and exclude remaining child div elements using Beautifulsoup

决定尝试网络抓取。遇到了一个棘手的 div 块,花了几个小时搜索并试图弄清楚如何解决这个问题和 return 我默认情况下预期的预期输出。但似乎无法理解要采用的方法。

我在 class“listing__details-pricing”下遇到 div 问题。 Div 和 class "listing__details-pricing" 有三种不同的形式。表格 3 return 是我的预期结果,其他表格 return 我没想到会被 return 编辑的附加值。

表格 1:

<div class="listing__details-pricing">
   €16,000 
   <div class="listing__details-private-seller">Private</div>
</div>

表格 2:

<div class="listing__details-pricing">
   €16,000
   <div class="listing__details-pricing-monthly">
      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
         <path d="M235.4 172.2c0-11.4 9.3-19.9 20.5-19.9 11.4 0 20.7 8.5 20.7 19.9s-9.3 20-20.7 20c-11.2 0-20.5-8.6-20.5-20zm1.4 35.7H275V352h-38.2V207.9z"></path>
         <path d="M256 76c48.1 0 93.3 18.7 127.3 52.7S436 207.9 436 256s-18.7 93.3-52.7 127.3S304.1 436 256 436c-48.1 0-93.3-18.7-127.3-52.7S76 304.1 76 256s18.7-93.3 52.7-127.3S207.9 76 256 76m0-28C141.1 48 48 141.1 48 256s93.1 208 208 208 208-93.1 208-208S370.9 48 256 48z"></path>
      </svg>
      €306
      <div class="listing__details-pricing-monthly-per-month">PER MONTH</div>
   </div>
</div>

表格 3:

<div class="listing__details-pricing">€16,250</div>

代码:

from bs4 import BeautifulSoup


html = """<html>
<body>
       <div class="vehicle-search-form__results">
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Meath</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Golf</h2>
                               <p>1.6 TDI MATCH EDITION BLUEMOTION 110PS 5DR</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2016</p>
                               </div>
                               <div class="listing__details-data-reg">(161 REG)</div>
                               <div class="listing__details-data-mileage">140,012 km</div>
                            </div>
                            <div class="listing__details-pricing">
                               €16,000
                               <div class="listing__details-private-seller">Private</div>
                            </div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                      
         
                 
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Longford</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>2.0 TDI SE BUSINESS</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2015</p>
                               </div>
                               <div class="listing__details-data-reg">(152 REG)</div>
                               <div class="listing__details-data-mileage">164,778 km</div>
                            </div>
                            <div class="listing__details-pricing">€16,250</div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                         
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Monaghan</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>HIGHLINE BE 2.0 TDI MANUAL 6SPEED FWD 150HP 4DR</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2016</p>
                               </div>
                               <div class="listing__details-data-reg">(161 REG)</div>
                               <div class="listing__details-data-mileage">230,000 km</div>
                            </div>
                            <div class="listing__details-pricing">
                               €16,000
                               <div class="listing__details-pricing-monthly">
                                  <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
                                     <path d="M235.4 172.2c0-11.4 9.3-19.9 20.5-19.9 11.4 0 20.7 8.5 20.7 19.9s-9.3 20-20.7 20c-11.2 0-20.5-8.6-20.5-20zm1.4 35.7H275V352h-38.2V207.9z"></path>
                                     <path d="M256 76c48.1 0 93.3 18.7 127.3 52.7S436 207.9 436 256s-18.7 93.3-52.7 127.3S304.1 436 256 436c-48.1 0-93.3-18.7-127.3-52.7S76 304.1 76 256s18.7-93.3 52.7-127.3S207.9 76 256 76m0-28C141.1 48 48 141.1 48 256s93.1 208 208 208 208-93.1 208-208S370.9 48 256 48z"></path>
                                  </svg>
                                  €306
                                  <div class="listing__details-pricing-monthly-per-month">PER MONTH</div>
                               </div>
                            </div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
             <div class="ais-InfiniteScroll-sentinel"></div>
          </div>

</body>
</html>
"""

soup = BeautifulSoup(html, "html.parser")
results = soup.find(class_="vehicle-search-form__results")

job_elements = results.find_all(class_="listing__details listing__details--desktop")
for job_element in job_elements:
    price = job_element.find(class_="listing__details-pricing")

    print(price.text.strip())

当前输出:

€16,000
Private
€16,250
€16,000€306PER MONTH

预期输出:

€16,000
€16,250
€16,000

将最后一行更改为:

print(price.contents[0].strip())

这会打印:

€16,000
€16,250
€16,000

或者:

print(price.find(text=True).strip())

所有价格值紧接在<div class="listing__details-pricing">之后,称为文本节点。您可以直接应用 class_="listing__details-pricing" 然后通过调用 find(text=True)

来获取文本节点值
from bs4 import BeautifulSoup


html = """<html>
<body>
       <div class="vehicle-search-form__results">
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Meath</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Golf</h2>
                               <p>1.6 TDI MATCH EDITION BLUEMOTION 110PS 5DR</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2016</p>
                               </div>
                               <div class="listing__details-data-reg">(161 REG)</div>
                               <div class="listing__details-data-mileage">140,012 km</div>
                            </div>
                            <div class="listing__details-pricing">
                               €16,000
                               <div class="listing__details-private-seller">Private</div>
                            </div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                      
         
                 
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Longford</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>2.0 TDI SE BUSINESS</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2015</p>
                               </div>
                               <div class="listing__details-data-reg">(152 REG)</div>
                               <div class="listing__details-data-mileage">164,778 km</div>
                            </div>
                            <div class="listing__details-pricing">€16,250</div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                         
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Monaghan</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>HIGHLINE BE 2.0 TDI MANUAL 6SPEED FWD 150HP 4DR</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2016</p>
                               </div>
                               <div class="listing__details-data-reg">(161 REG)</div>
                               <div class="listing__details-data-mileage">230,000 km</div>
                            </div>
                            <div class="listing__details-pricing">
                               €16,000
                               <div class="listing__details-pricing-monthly">
                                  <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
                                     <path d="M235.4 172.2c0-11.4 9.3-19.9 20.5-19.9 11.4 0 20.7 8.5 20.7 19.9s-9.3 20-20.7 20c-11.2 0-20.5-8.6-20.5-20zm1.4 35.7H275V352h-38.2V207.9z"></path>
                                     <path d="M256 76c48.1 0 93.3 18.7 127.3 52.7S436 207.9 436 256s-18.7 93.3-52.7 127.3S304.1 436 256 436c-48.1 0-93.3-18.7-127.3-52.7S76 304.1 76 256s18.7-93.3 52.7-127.3S207.9 76 256 76m0-28C141.1 48 48 141.1 48 256s93.1 208 208 208 208-93.1 208-208S370.9 48 256 48z"></path>
                                  </svg>
                                  €306
                                  <div class="listing__details-pricing-monthly-per-month">PER MONTH</div>
                               </div>
                            </div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
             <div class="ais-InfiniteScroll-sentinel"></div>
          </div>

</body>
</html>
"""

soup = BeautifulSoup(html, "html.parser")
job_elements = soup.find_all(class_="listing__details-pricing")
for job_element in job_elements:
    price = job_element.find(text=True).strip()

    print(price)

输出:

€16,000
€16,250
€16,000