如何使用scrapy爬取angularjs个网站？

Question

我需要一种方法来获取博彩公司所有事件的所有赔率

我正在使用 Scrapy+Splash 获取网站的第一个 javascript 加载内容。但是要获得所有其他赔率，我必须单击 "Spagna-LigaSpagnola"、"Italia->Serie A" 等

我该怎么做？

Answer 1

您可以模拟行为，例如滚动或点击，通过编写JavaScript 脚本并告诉 Splash 在呈现您的页面时执行该脚本。

一个小例子：

您定义了一个 JavaScript 函数，它选择页面中的一个元素，然后单击它：

（来源：splash doc）

  -- Get button element dimensions with javascript and perform mouse click.
_script = """
function main(splash)
    assert(splash:go(splash.args.url))
    local get_dimensions = splash:jsfunc([[
        function () {
            var rect = document.getElementById('button').getClientRects()[0];
            return {"x": rect.left, "y": rect.top}
        }
    ]])
    splash:set_viewport_full()
    splash:wait(0.1)
    local dimensions = get_dimensions()
    splash:mouse_click(dimensions.x, dimensions.y)

    -- Wait split second to allow event to propagate.
    splash:wait(0.1)
    return splash:html()
end
"""

然后，当你 request 时，你修改 endpoint 并将其设置为 "execute"，然后将 "lua_script": _script 添加到 args.

def parse(self, response):
    yield SplashRequest(response.url, self.parse_elem,
                        endpoint="execute",
                        args={"lua_source": _script})

您将找到有关 splash 脚本 here

的所有信息

如何使用scrapy爬取angularjs个网站？

How to use scrapy to crawl angularjs websites?

python

screen-scraping

scrapy

web-scraping

scrapy-spider