为什么 scrapy 不返回任何链接?

Why scrapy is not returning any links?

最近,我尝试制作一些工具来简化自己的公寓搜索并尽快仅获取相关信息(该网站不是那么用户友好),但我 运行 进入了问题,也许我现在只是家庭盲人......或者只是愚蠢,因为这不是我的专业知识。

所以,无论如何。我有一个 link 过滤掉的结果:

class BostadSpider(scrapy.Spider):
    name = "bostadformedlingen"
    start_urls = ['https://bostad.stockholm.se/Lista/?s=58.66266&n=59.99899&w=17.07550&e=19.23431&sort=annonserad-fran-desc']

    def parse(self, response):
        for ad in response.css(
            "div.apartment-search-hits > ul.apartment-search-ad-list > li.ad-list__item > a::attr('href')"):
        print(ad.get())

这是来自网站的结构:

<main class="display-flex flex-column search-wrapper u-m-a-0 u-p-a-0" id="main-content">
    <div class="row no-gutters search-wrapper__inner">
        <div id="apartment-search-hits" class="apartment-search-hits" aria-hidden="false">
            <ul id="apartment-search-ad-list" class="ad-list" aria-hidden="false">
                <li class="ad-list__item"> <a href="/Lista/Details?aid=190412" class="ad-list__link">

我应该“再往上一点”并包括“主要”吗?

实际上数据是从 api 调用 json 响应生成的。如果您禁用 javascript 那么您将看到该页面变为空白,这意味着 url 是动态的。这就是为什么我们无法以这种方式获取数据的原因。这是工作解决方案:

代码:

import scrapy
import json

class BostSpider(scrapy.Spider):

    name = 'bost'

    def start_requests(self):
        yield scrapy.Request(
            url='https://bostad.stockholm.se/Lista/AllaAnnonser',
            method='GET',
            callback=self.parse)
       

    def parse(self, response):
        resp = json.loads(response.body)
        
        for h in resp:
            url = h['Url']
            abs_url = response.urljoin(url)
            yield {
                'URL': abs_url
            }

输出:

{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190400'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190401'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190360'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190325'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190413'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190412'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190383'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190229'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190230'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190414'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190407'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190432'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190377'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190424'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190291'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190382'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190384'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190356'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190349'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190287'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190399'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190428'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190404'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190368'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190371'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190373'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190390'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190385'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190416'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190396'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190394'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190402'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190359'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190358'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190357'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190265'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190264'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190422'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190420'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190410'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190398'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190429'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190403'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190423'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190417'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190362'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190361'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190387'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190376'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190386'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190391'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190369'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190363'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190409'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190427'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190364'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190378'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>  
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190375'}
        

...等等