xpath 为什么我在这个扩展中得到空结果
xpath why i got empty result in this expth
我试试这个 xpath
.//div[@class='owl-wrapper']
在此网站上
http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1
但是我得到的结果是空的,虽然我可以在 Google F12 开发者工具中看到它。
您可能认为这是一个 javascript 调用,但这不是因为,我正在使用 scrapy 并且我可以 view
这样的响应:
scrapy shell ("website")
view(response)
class 在那里。
请帮忙
我的 Chrome 使用视图(响应)出现的页面的屏幕截图
问题是:包含带有 owl-wrapper
class 的 div
元素的搜索结果 通过额外的 GET 请求 异步加载。
您需要在代码中模拟此请求,例如使用 requests
:
import requests
with requests.Session() as session:
session.get('http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1')
params = {
'url': 'filter__cid/0/sort/score__desc/per_page/20/page/1',
'ajax': 'true'
}
response = session.get('http://www.justproperty.com/search/featured-properties/', params=params)
results = response.json()
for result in results:
print result['description']
打印:
2 bedroom unit on high floor. Full Fountain View,It comes with different amenities, facilities and hotel services. It is located in a prime location, The Address Hotel Lake Downtown. This property is...
Large Upgraded 1 Bedroom For Sale In Index Tower DIFC With DIFC ViewSize: 840 square feet - 78 square metersBedroom: 1 Bathroom: 1 plus guest washroomKitchen: Fully Equipped modern style kitchen with...
Spacious and nice 1-bedroom apartment for
...
示例Scrapy
spider 基于上面提供的解决方案:
import json
import scrapy
class JustPropertySpider(scrapy.Spider):
name = "justproperty"
allowed_domains = ["justproperty.com"]
start_urls = [
"http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1"
]
def parse(self, response):
yield scrapy.Request('http://www.justproperty.com/search/featured-properties/?url=filter__cid/0/sort/score__desc/per_page/20/page/1&ajax=true',
callback=self.parse_results,
headers={'X-Requested-With': 'XMLHttpRequest'})
def parse_results(self, response):
results = json.loads(response.body)
for result in results:
print result['description']
我试试这个 xpath
.//div[@class='owl-wrapper']
在此网站上
http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1
但是我得到的结果是空的,虽然我可以在 Google F12 开发者工具中看到它。
您可能认为这是一个 javascript 调用,但这不是因为,我正在使用 scrapy 并且我可以 view
这样的响应:
scrapy shell ("website")
view(response)
class 在那里。
请帮忙
我的 Chrome 使用视图(响应)出现的页面的屏幕截图
问题是:包含带有 owl-wrapper
class 的 div
元素的搜索结果 通过额外的 GET 请求 异步加载。
您需要在代码中模拟此请求,例如使用 requests
:
import requests
with requests.Session() as session:
session.get('http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1')
params = {
'url': 'filter__cid/0/sort/score__desc/per_page/20/page/1',
'ajax': 'true'
}
response = session.get('http://www.justproperty.com/search/featured-properties/', params=params)
results = response.json()
for result in results:
print result['description']
打印:
2 bedroom unit on high floor. Full Fountain View,It comes with different amenities, facilities and hotel services. It is located in a prime location, The Address Hotel Lake Downtown. This property is...
Large Upgraded 1 Bedroom For Sale In Index Tower DIFC With DIFC ViewSize: 840 square feet - 78 square metersBedroom: 1 Bathroom: 1 plus guest washroomKitchen: Fully Equipped modern style kitchen with...
Spacious and nice 1-bedroom apartment for
...
示例Scrapy
spider 基于上面提供的解决方案:
import json
import scrapy
class JustPropertySpider(scrapy.Spider):
name = "justproperty"
allowed_domains = ["justproperty.com"]
start_urls = [
"http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1"
]
def parse(self, response):
yield scrapy.Request('http://www.justproperty.com/search/featured-properties/?url=filter__cid/0/sort/score__desc/per_page/20/page/1&ajax=true',
callback=self.parse_results,
headers={'X-Requested-With': 'XMLHttpRequest'})
def parse_results(self, response):
results = json.loads(response.body)
for result in results:
print result['description']