使用 python 抓取网页
Scrape web pages using python
我有以下网页
</div><a href="https://www.emag.ro/laptop-lenovo-thinkbook-15-iil-cu-procesor-intel-core-i7-1065g7-pana-la-3-90-ghz-15-6-full-hd-16gb-512gb-ssd-intel-iris-plus-graphics-free-dos-mineral-grey-20sm003jrm/pd/DKBK1TMBM/#reviews-section" rel="nofollow" class="star-rating-container js-product-url" data-zone="reviews"><div class="star-rating star-rating-read rated-4.02 star-rating-sm ">
<div class="star-rating-inner " style="width: 100%"></div>
</div><div class="star-rating-text ">
我想从这个产品中提取评级。
对于此产品,评级在此处定义。
<div class="star-rating star-rating-read rated-4.02 star-rating-sm ">
而且我无法提取 4.02.
我的代码如下:
rating = container.find_all(class_="star-rating star-rating-read rated")[0].text
我知道上面的代码不对,我可以提取产品的价格和名称,但无法提取评级:(
这是一个您可以尝试的解决方案,
import re
# regex extract the decimal digits from string
extract_ = re.compile(r"\d+.\d+")
for div in container.find_all("div", attrs={"class": 'star-rating'}):
for attr in div.attrs['class']:
ratings_ = extract_.search(attr)
if ratings_:
print(ratings_.group()) # 4.02
尝试这样的事情:
rating = str(container.find_all(class_="star-rating")[0])
rindex = rating.index("rated")
print(rating[rindex+6:rindex+10])
我有以下网页
</div><a href="https://www.emag.ro/laptop-lenovo-thinkbook-15-iil-cu-procesor-intel-core-i7-1065g7-pana-la-3-90-ghz-15-6-full-hd-16gb-512gb-ssd-intel-iris-plus-graphics-free-dos-mineral-grey-20sm003jrm/pd/DKBK1TMBM/#reviews-section" rel="nofollow" class="star-rating-container js-product-url" data-zone="reviews"><div class="star-rating star-rating-read rated-4.02 star-rating-sm ">
<div class="star-rating-inner " style="width: 100%"></div>
</div><div class="star-rating-text ">
我想从这个产品中提取评级。 对于此产品,评级在此处定义。
<div class="star-rating star-rating-read rated-4.02 star-rating-sm ">
而且我无法提取 4.02.
我的代码如下:
rating = container.find_all(class_="star-rating star-rating-read rated")[0].text
我知道上面的代码不对,我可以提取产品的价格和名称,但无法提取评级:(
这是一个您可以尝试的解决方案,
import re
# regex extract the decimal digits from string
extract_ = re.compile(r"\d+.\d+")
for div in container.find_all("div", attrs={"class": 'star-rating'}):
for attr in div.attrs['class']:
ratings_ = extract_.search(attr)
if ratings_:
print(ratings_.group()) # 4.02
尝试这样的事情:
rating = str(container.find_all(class_="star-rating")[0])
rindex = rating.index("rated")
print(rating[rindex+6:rindex+10])