如何从第一个 "select/filter" 运行 的列表 <HtmlNode> 创建一个新的 HtmlAgilityPack.HtmlDocument?

How to create a new HtmlAgilityPack.HtmlDocument from a List<HtmlNode> from a first "select/filter" run?

使用html agility pack。如何从我从原始 .html 中过滤掉的节点列表创建新的 HtmlAgilityPack.HtmlDocument?

//filter orig. .html and get all the nodes I want to edit later
LstAllTablesDocNodes = 
htmlDoc.DocumentNode.SelectNodes("//table[@class='pricelist']").ToList();

//now pseudoCode: Of what I would like to do (this would give an Error)
HtmlAgilityPack.HtmlDocument htmlDoc2 = 
new HtmlAgilityPack.HtmlDocument(LstAllTablesDocNodes);

循环检索到的节点并提取它们 html 并组合成一个字符串。然后将其放入您的新 HtmlDocument。对于某些情况,例如对于 tr 节点,您可能需要父包装节点(tabletr 的情况下),以便不通过关闭文档中的 html 解析。

using System;
using HtmlAgilityPack;
using System.Text;

public class Program
{
    public static void Main()
    {
        var hw = new HtmlAgilityPack.HtmlWeb();
        var doc = new HtmlDocument();
        doc = hw.Load("http://books.toscrape.com/");
        var books = doc.DocumentNode.SelectNodes("//h3/a");
        // Console.WriteLine(books.Count);
        var output = new StringBuilder();
        foreach(HtmlNode book in books)
            {
                output.Append(book.OuterHtml);
            }
        var doc2 = new HtmlDocument();
        doc2.LoadHtml(output.ToString());
        Console.WriteLine(doc2.DocumentNode.InnerHtml);
    }
}

Fiddle

参考文献:

  1. HtmlAgilityPack substring of all by length
  2. https://www.tutorialsteacher.com/csharp/csharp-stringbuilder