node.js \ 如何处理消毒结果-html

Question

Node.js 的新手我尝试使用 sanitize-html 模块清理 node.js 中的 html - 我认为这个问题看起来更通用：

插件输出一个对象（我打印到控制台并显示 [object]）- 我怎么知道如何使用这个对象？它的字段是什么，如何将它写入文件等。（我知道这听起来很基础 - 我应该序列化它吗？使用对象的方法是什么..）

var Crawler = require("js-crawler");
var download = require("url-download");
var sanitizeHtml = require('sanitize-html');
var util = require('util');
var fs = require('fs');

new Crawler().configure({depth: 1})
  .crawl("http://www.cnn.com", function onSuccess(page) {

    var clean = sanitizeHtml(page);
    console.log(clean);
    fs.writeFile('sanitized.txt', clean, function (err) {
        if (err) throw err;
        console.log('It\'s saved! in same location.');
    });

    console.log(util.inspect(clean, {showHidden: false, depth: null}));
    var str = JSON.stringify(clean.toString());
    console.log(str);
    /*download(page.url, './download')
    .on('close', function () {
      console.log('One file has been downloaded.');
    });*/
  });

Answer 1

您的问题不在于“sanitize-html”。您没有正确对待页面变量。你应该使用：

var clean = sanitizeHtml(page.body);

node.js \ 如何处理消毒结果-html

node.js \ how to treat to result of sanitize-html

html-sanitizing

node.js