Beautiful Soup，get_text 但不是 <span> 文本。我怎样才能得到它？

Question

鉴于此标记： [标记][1]

我需要在一列中获取数字 182，在另一列中获取 58。我 已经有了 span，但是当我调用 div.get_tex() 或 string 时，它 returns = 18258(两个数字)

这是我的代码_:

prices= soup.find_all('div', class_='grilla-producto-precio')

cents= []
price= []
for px in prices:
    ### here i need to get the number 182 and append it to "price"
    for spn in px.find('span'):
        cents.append(spn)

如何在没有SPAN的情况下单独获得价格182？谢谢！！！！ [1]: https://i.stack.imgur.com/ld9qo.png

Answer 1

您问题的答案与this question的答案几乎相同。

from bs4 import BeautifulSoup

html = """
<div class = "grilla-producto-precio">
" $"
"182"
<span>58</span>
</div>
"""
soup = BeautifulSoup(html,'html5lib')

prices = soup.find_all('div',class_ = "grilla-producto-precio")

cents = []

for px in prices:
    txt = px.find_next(text=True).strip()

    txt = txt.replace('"','')

    txt = int(txt.split("\n")[-1])
    
    cents.append(txt)

输出：

[182]

Answer 2

另一个解决方案是检查字符串 isdigit():

from bs4 import BeautifulSoup

txt = """
<div class = "grilla-producto-precio">
" $"
"182"
<span>58</span>
</div>
"""
soup = BeautifulSoup(txt, "html.parser")

data = soup.find("div", class_="grilla-producto-precio").next
price = [int("".join(d for d in data if d.isdigit()))]

print(price) # Output: [182]

Beautiful Soup，get_text 但不是 <span> 文本。我怎样才能得到它？

Beautiful Soup, get_text but NOT the <span> text.. How can i get it?

python

beautifulsoup

mysql-python

web-scraping

python-3.x