使用 cheerio 执行相当于 document.getElementsByClassName() 的操作，并使用 url 进行网络抓取

Question

这在浏览器中工作，但当我尝试用 cheerio 做同样的事情时，node.js 它不工作：

var request = require('request');
var cheerio = require('cheerio');
var url = 'https://www.google.fr/search?ei=apX6WdzaIMzWUabjqvAF&q=ok&oq=ok&gs_l=psy-ab.3..0i67k1l4j0j0i67k1l2j0i131k1j0j0i67k1.2633.3962.0.4021.3.3.0.0.0.0.58.169.3.3.0....0...1.1.64.psy-ab..0.3.169....0.524Rrv-4zlU'

request(url, function (error, response, html) {
  if (!error && response.statusCode == 200) {
    var $ = cheerio.load(html);
    console.log($('.r')[0].innerText);
  }
});

我从未使用过 cheerio，这段代码在终端中给我未定义的消息，为什么？

Answer 1

根据 cheerio doc，您似乎可以使用：

$('.r').first().text()

或

$('.r').eq(0).text()

我不知道 cheerio 是否像 jQuery 那样支持直接数组访问，因为这些不是真正的 DOM 对象（但是由 Cheerio 创建的伪对象），我不知道在 Cheerio 文档中查看对 .innerText 的任何支持，事实上，Github 搜索 "innerText" 也没有得到任何匹配。看起来您可以在 Cheerio 集合对象上使用 .html() 或 .text()。

如果您得到一个特定的节点对象，就像您可能一直在尝试使用 $('.r')[0]，那么该节点对象支持的属性（不同于 cheerio 集合对象）are listed here如下：

tagName
parentNode
previousSibling
nextSibling
nodeValue
firstChild
childNodes
lastChild

所以，如果你得到实际的节点对象，你可能会使用：

$('.r').get(0).nodeValue

并且，这将为您提供节点的原始内容。我希望前面的 .text() 示例可能是获得结果的更安全、更简单的方法。

使用 cheerio 执行相当于 document.getElementsByClassName() 的操作，并使用 url 进行网络抓取

Do the equivalent of document.getElementsByClassName() with cheerio and a url for web scraping

javascript

url

jquery

node.js

cheerio