如何获取<li>标签信息（BeautifulSoup网页抓取）？

Question

我正在从这个页面抓取信息：
https://lawyers.justia.com/lawyer/michael-paul-ehline-85006。我正在尝试抓取费用部分下的所有信息。我想要的是以下信息： 免费咨询是的接受信用卡维萨卡、万事达卡、美国运通卡或有费用仅在人身伤害案件中。费率、保留费和其他信息费率视具体情况而定。

这是我试过的：

for thing in soup.findAll('ul', attrs={"class": "has-no-list-styles"}):
   ul=thing.find('<li>')
   print(ul)

但输出是：

<li>Intellectual Property</li>
<li>Copyright Law</li>
<li><strong>English</strong></li>

提前致谢。

更新：我找到了一个解决方案，但它给了我一个无限循环，有什么建议吗？

for o in soup.findAll('div', attrs={"class": "block-wrapper"}):     
    for tag in soup.findAll('div', attrs={"class": "block-wrapper"}):
        if tag.string:
            tag.string.replace_with("")
        for de in o.findAll("li"):
            if de != []:
                de=remove_tags(str(de))
                print (de)

Answer 1

尝尝这道汤。它的灵感来自 dabinsous answer。它所做的只是寻找他详细说明的图标，然后转到其父级的下一个兄弟姐妹，并从那里获取兄弟姐妹的文本。

import requests 
from bs4 import BeautifulSoup 

URL = "https://lawyers.justia.com/lawyer/michael-paul-ehline-85006"
r = requests.get(URL) 
soup = BeautifulSoup(r.content, 'html.parser')
uls = soup.find('span', attrs={"class": "jicon -large jicon-fee"})
print(uls.parent.nextSibling.text)

调整您的抓取以满足该要求，看看是否有帮助！

Answer 2

试试这个。

from simplified_scrapy import SimplifiedDoc,req
html = req.get('https://lawyers.justia.com/lawyer/michael-paul-ehline-85006')
doc = SimplifiedDoc(html)
ul = doc.getElement('ul',attr='class',value='has-no-list-styles',start='class="jicon -large jicon-fee"') # Use class="jicon -large jicon-fee" to locate
print (ul.text)

结果：

Free ConsultationYesCredit Cards AcceptedVisa, Mastercard, American ExpressContingent FeesIn personal injury cases only.Rates, Retainers and Additional InformationRates vary on a case by case basis.

如何获取<li>标签信息（BeautifulSoup网页抓取）？

How to get <li> tag information (BeautifulSoup Webscraping)?

python

beautifulsoup

scrape

web