使用 python beautifulsoup 网络抓取中的未知错误
unknown error in beautifulsoup web scraping using python
<div id="browse_in_widget">
<span id="browse_in_breadcrumb" style="width: 583px;">
<div class="seo_itemscope" itemtype="http://data-vocabulary.org/Breadcrumb" itemscope="">
<a itemprop="url" href="/search/"> Arabian Area</a>
<span class="seo_itemprop-title" itemprop="title">Arabian Area</span>
>
</div>
<div class="seo_itemscope" itemtype="http://data-vocabulary.org/Breadcrumb" itemscope="">
<a itemprop="url" href="/property-for-rent/home/"> Phase 2 </a>
<span class="seo_itemprop-title" itemprop="title">Phase 2 </span>
>
</div>
<div class="seo_itemscope" itemtype="http://data-vocabulary.org/Breadcrumb" itemscope="">
<a itemprop="url" href="/property-for-rent/residential/"> Residential Units for Rent </a>
<span class="seo_itemprop-title" itemprop="title">Residential Units for Rent</span>
>
</div>
<div class="seo_itemscope" itemtype="http://data-vocabulary.org/Breadcrumb" itemscope="">
<a itemprop="url" href="/property-for-rent/residential/apartmentflat/"> Apartment/Flat for Rent </a>
<span class="seo_itemprop-title" itemprop="title">Apartment/Flat for Rent</span>
>
</div>
<strong class="seo_itemprop-title" itemprop="title">Details</strong>
</span>
</div>
我想得到
['Arabian Area', 'Phase 2', 'Residential Units for Rent','Apartment/Flat for Rent']
我正在尝试使用以下代码使用漂亮的 4 python
try:
Type = [str(Area.text) for Area in soup.find_all("span", {"class" : "seo_itemscope"})]
Area=' , '.join(Area)
print Area
except StandardError as e:
Area="Error was {0}".format(e)
print Area
我只想在列表中获得所需的输出,但似乎存在一些问题。我没有得到任何印刷品。可能是什么问题?
谢谢!
第一个问题是您要查找具有 seo_itemscope
class 的 span
个不存在的元素。如果您要查找标题,请使用 seo_itemprop-title
:
Type = [item.get_text() for item in soup.find_all("span", {"class": "seo_itemprop-title"})]
另一个问题在这里:
Area=' , '.join(Area)
您原本打算加入 Type
列表的项目:
Area = ' , '.join(Type)
而且,抓住 StandardError
不是一个好主意 - 它的例外范围太广,实际上接近于只有一个 except
子句。您应该捕获更具体的异常,请参阅:
<div id="browse_in_widget">
<span id="browse_in_breadcrumb" style="width: 583px;">
<div class="seo_itemscope" itemtype="http://data-vocabulary.org/Breadcrumb" itemscope="">
<a itemprop="url" href="/search/"> Arabian Area</a>
<span class="seo_itemprop-title" itemprop="title">Arabian Area</span>
>
</div>
<div class="seo_itemscope" itemtype="http://data-vocabulary.org/Breadcrumb" itemscope="">
<a itemprop="url" href="/property-for-rent/home/"> Phase 2 </a>
<span class="seo_itemprop-title" itemprop="title">Phase 2 </span>
>
</div>
<div class="seo_itemscope" itemtype="http://data-vocabulary.org/Breadcrumb" itemscope="">
<a itemprop="url" href="/property-for-rent/residential/"> Residential Units for Rent </a>
<span class="seo_itemprop-title" itemprop="title">Residential Units for Rent</span>
>
</div>
<div class="seo_itemscope" itemtype="http://data-vocabulary.org/Breadcrumb" itemscope="">
<a itemprop="url" href="/property-for-rent/residential/apartmentflat/"> Apartment/Flat for Rent </a>
<span class="seo_itemprop-title" itemprop="title">Apartment/Flat for Rent</span>
>
</div>
<strong class="seo_itemprop-title" itemprop="title">Details</strong>
</span>
</div>
我想得到
['Arabian Area', 'Phase 2', 'Residential Units for Rent','Apartment/Flat for Rent']
我正在尝试使用以下代码使用漂亮的 4 python
try:
Type = [str(Area.text) for Area in soup.find_all("span", {"class" : "seo_itemscope"})]
Area=' , '.join(Area)
print Area
except StandardError as e:
Area="Error was {0}".format(e)
print Area
我只想在列表中获得所需的输出,但似乎存在一些问题。我没有得到任何印刷品。可能是什么问题?
谢谢!
第一个问题是您要查找具有 seo_itemscope
class 的 span
个不存在的元素。如果您要查找标题,请使用 seo_itemprop-title
:
Type = [item.get_text() for item in soup.find_all("span", {"class": "seo_itemprop-title"})]
另一个问题在这里:
Area=' , '.join(Area)
您原本打算加入 Type
列表的项目:
Area = ' , '.join(Type)
而且,抓住 StandardError
不是一个好主意 - 它的例外范围太广,实际上接近于只有一个 except
子句。您应该捕获更具体的异常,请参阅: