Python xpath: 尝试一个 xpath,除了填写给定的值
Python xpath: try an xpath, except fill in a given value
我正在从网站上抓取评论。最终我需要几个列表(例如用户名和日期),它们将在每次审查时放入一个字典中,这样它看起来像这样:
reviews:[{'username':'Harry','date':'april'},
{'username':'Rob','date':'may'}]
这些列表必须同样长,因为我将它们放在这样的字典中:
评论=[]
for i in range(len(username)):
reviews.append({'username':username[i].strip(),
'date':date[i].strip()})
然而,当没有用户名时,xpath 不会 return 任何东西,而且我的列表太短(这将给出错误 "list index out of range")。当 xpath 不起作用时,如何填写给定值(例如 "no name")?如果尝试过这样的事情(我认为会起作用但不起作用):
try:
names = tree.xpath..
except:
"no name"
编辑:HTML
评论类型的示例(移动与非移动)。
手机评论:
<div class="rating reviewItemInline">
<span class="ui_bubble_rating bubble_50"></span>
<span class="ratingDate relativeDate">Reviewed 6 days ago</span>
<a class="viaMobile">via mobile</a>
</div>
非手机评论:
<div class="rating reviewItemInline">
<span class="ui_bubble_rating bubble_50"></span>
<span class="ratingDate relativeDate">Reviewed 6 days ago</span>
</div>
您必须迭代所需的项目,然后检查每个字段所需的 xpath,例如:
review_elems = tree_html.xpath('//div[@class="rating reviewItemInline"]')
reviews = []
for review_elem in reviews_elems:
review = {}
username = review_elem.xpath('.//a[@class="viaMobile"]')
if username:
review['username'] = username[0].text
else:
review['username'] = 'no name'
# keep filling review with more fields
reviews.append(review)
print(reviews)
不需要实现try
/except
,只需尝试获取所有必需元素的两个列表,如下所示:
html = lxml.html.fromstring("source code here")
reviews = html.xpath('//div[@class="rating reviewItemInline"]')
dates = [i.xpath('./span[@class="ratingDate relativeDate"]')[0].text for i in reviews]
mobile = [i.xpath('./a')[0].text if i.xpath('./a') else "no" for i in reviews]
output = [{'date': i, 'via mobile': j} for i, j in zip(dates, mobile)]
output
应该类似于
[{'date': 'Reviewed 6 days ago', 'via mobile': 'via mobile'}, {'date': 'Reviewed 6 days ago', 'via mobile': 'no'}]
我正在从网站上抓取评论。最终我需要几个列表(例如用户名和日期),它们将在每次审查时放入一个字典中,这样它看起来像这样:
reviews:[{'username':'Harry','date':'april'},
{'username':'Rob','date':'may'}]
这些列表必须同样长,因为我将它们放在这样的字典中: 评论=[]
for i in range(len(username)):
reviews.append({'username':username[i].strip(),
'date':date[i].strip()})
然而,当没有用户名时,xpath 不会 return 任何东西,而且我的列表太短(这将给出错误 "list index out of range")。当 xpath 不起作用时,如何填写给定值(例如 "no name")?如果尝试过这样的事情(我认为会起作用但不起作用):
try:
names = tree.xpath..
except:
"no name"
编辑:HTML
评论类型的示例(移动与非移动)。
手机评论:
<div class="rating reviewItemInline">
<span class="ui_bubble_rating bubble_50"></span>
<span class="ratingDate relativeDate">Reviewed 6 days ago</span>
<a class="viaMobile">via mobile</a>
</div>
非手机评论:
<div class="rating reviewItemInline">
<span class="ui_bubble_rating bubble_50"></span>
<span class="ratingDate relativeDate">Reviewed 6 days ago</span>
</div>
您必须迭代所需的项目,然后检查每个字段所需的 xpath,例如:
review_elems = tree_html.xpath('//div[@class="rating reviewItemInline"]')
reviews = []
for review_elem in reviews_elems:
review = {}
username = review_elem.xpath('.//a[@class="viaMobile"]')
if username:
review['username'] = username[0].text
else:
review['username'] = 'no name'
# keep filling review with more fields
reviews.append(review)
print(reviews)
不需要实现try
/except
,只需尝试获取所有必需元素的两个列表,如下所示:
html = lxml.html.fromstring("source code here")
reviews = html.xpath('//div[@class="rating reviewItemInline"]')
dates = [i.xpath('./span[@class="ratingDate relativeDate"]')[0].text for i in reviews]
mobile = [i.xpath('./a')[0].text if i.xpath('./a') else "no" for i in reviews]
output = [{'date': i, 'via mobile': j} for i, j in zip(dates, mobile)]
output
应该类似于
[{'date': 'Reviewed 6 days ago', 'via mobile': 'via mobile'}, {'date': 'Reviewed 6 days ago', 'via mobile': 'no'}]