使用 NodeJS 和 Cheerio 进行网页抓取

Question

我正在尝试让我的网络抓取工具从网站 (https://hypebeast.com/footwear) 中提取每篇文章的标题，但除了未定义或一些非常混乱的错误外，我似乎什么也得不到。我究竟做错了什么？这是我的代码片段：

        const request = require('request');
        const cheerio = require('cheerio');
        
        var titles = [];

        request('https://hypebeast.com/footwear', function(err, resp, body) {
            var $ = cheerio.load(body);
            $('.title').each(function(){
                var title = $(this).attr('span');
                titles.push(title);
            });

            console.log(titles);

        });

这是错误： http://imgur.com/chB9v6h

Answer 1

这不是 Cheerio 阅读问题。我打开网站，发现 DOM 结构不同。所以要找到你必须像这样使用脚本的东西：

const request = require('request');
        const cheerio = require('cheerio');

        var titles = [];

        request('https://hypebeast.com/footwear', function(err, resp, body) {
            var $ = cheerio.load(body);
            $('.title').each(function(){
                var title = $(this).children("h2").children('span').text();
                titles.push(title);
            });

            console.log(titles);

        });

使用 NodeJS 和 Cheerio 进行网页抓取

Web scraping using NodeJS and Cheerio

request

node.js

web-scraping

cheerio