使用 css 选择器 scrapy 找不到 class

Can not find class using css selector scrapy

我正在测试是否可以使用 scrapy 抓取网站。我从该站点得到响应,但我可以访问我想要的元素或数据。我的选择器是正确的,尽管我是 scrapy 的初学者,但我认为命令没有错误。 我想获得带有 class results-race-name 的标签 我通过 scrapy shell 运行了它 在 shell 中,我使用了以下命令

In [1]: fetch('https://greyhoundbet.racingpost.com/#results-list/r_date=2021-01-01/')

2022-01-07 15:08:58 [scrapy.core.engine] INFO: Spider opened
2022-01-07 15:09:01 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://greyhoundbet.racingpost.com/robots.txt> (referer: None)
2022-01-07 15:09:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://greyhoundbet.racingpost.com/#results-list/r_date=2021-01-01/> (referer: None)

In [2]: view(response)
Out[2]: True

In [3]: response.css('.results-race-name').extract()
Out[3]: []

注意 视图(响应)给我输出直到加载徽标

这不是 css 问题。数据是动态创建的。你可以从 json 文件中获取它(在浏览器中打开 devtools 点击网络选项卡,查看 json 请求并获取你需要的)。

In [1]: req = scrapy.Request('https://greyhoundbet.racingpost.com/results/blocks.sd?r_date=2021-01-01&blocks=header%2Cm
   ...: eetings')

In [2]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://greyhoundbet.racingpost.com/results/blocks.sd?r_date=2021-01-01&blocks=header%2Cmeetings> (referer: None)

In [3]: json_data = response.json()

In [4]: for data in json_data['meetings']['tracks']['1']['races']:
   ...:     print(data['track'])
   ...:
Newcastle
Swindon
Kinsley

In [5]: for data in json_data['meetings']['tracks']['2']['races']:
   ...:     print(data['track'])
   ...:
Monmore
Crayford
Hove
Harlow
Henlow

编辑:

spider.py

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "exampleSpider"
    start_urls = ['https://greyhoundbet.racingpost.com/results/blocks.sd?r_date=2021-01-01&blocks=header%2Cmeetings']

    def parse(self, response):
        json_data = response.json()

        for data in json_data['meetings']['tracks']['1']['races']:
            yield {'race': data['track']}

        for data in json_data['meetings']['tracks']['2']['races']:
            yield {'race': data['track']}

Example for spider

main.py:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

if __name__ == "__main__":
    spider = 'exampleSpider'
    settings = get_project_settings()
    settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    process = CrawlerProcess(settings)
    process.crawl(spider)
    process.start()

How to run scrapy from a script