正在使用 HtmlAgilityPack 解析 html 个文件
Parsing html files using HtmlAgilityPack
我有几个 Html 文件存储在同一目录(TestReport1.html、TestReport2.html...),内容如下:
<!DOCTYPE html>
<html>
<body>
<section class='summary'>
<ul class='resultSummary'>
<li class='Passed'>
<div class='summaryLine'>
<div class='summaryLabel'>Passed</div>
<span class='summaryCount'>199</span>
</div>
<input type='checkbox' class='cbx_toggle' unchecked/>
</li>
<li class='Inconclusive'>
<div class='summaryLine'>
<div class='summaryLabel'>Inconclusive</div>
<span class='summaryCount'>10</span>
</div>
<input type='checkbox' class='cbx_toggle' unchecked/>
</li>
<li class='NotImplemented'>
<div class='summaryLine'>
<div class='summaryLabel'>Not Implemented</div>
<span class='summaryCount'>5</span>
</div>
<input type='checkbox' class='cbx_toggle' unchecked/>
</li>
<li class='Failed'>
<div class='summaryLine'>
<div class='summaryLabel'>Failed</div>
<span class='summaryCount'>12</span>
</div>
<input type='checkbox' class='cbx_toggle' checked/>
</li>
<li id='summaryChart'</li>
</ul>
</section>
</body>
</html>
我想解析每个 html 文件并从每个节点列表中获取值及其相关的存储值到类似这样的输出中:
测试报告1:
通过:199
Inconclusive:10
未实施:5
失败:12
测试报告2:
通过:20
Inconclusive:10
未实施:50
失败:120
然后我想将所有结果合并到一个 html 摘要文件中:
总结测试报告:
通过总数:199
总计 Inconclusive:10
未实施总数:5
失败总数:12
如有任何提示和想法,我们将不胜感激
要获取值,您可以使用 XPath。示例:
"//*[@class='Inconclusive']/div/span"
C# HtmlAgilityPack
var html = new HtmlDocument();
html.LoadHtml(<html code>);
var xpath = "//*[@class='Inconclusive']/div/span";
var parse = html.DocumentNode.SelectSingleNode(xpath).InnerText;
如何获取 XPath:
// Description: HAP - Load (From File)
// Website: https://html-agility-pack.net/
// Run: https://dotnetfiddle.net/EsvZyg
// @nuget: HtmlAgilityPack
using System;
using System.Xml;
using HtmlAgilityPack;
public class Program
{
public static void Main()
{
SaveHtmlFile();
#region example
var path = @"test.html";
var doc = new HtmlDocument();
doc.Load(path);
var node = doc.DocumentNode.SelectSingleNode("//body");
Console.WriteLine(node.OuterHtml);
#endregion
}
private static void SaveHtmlFile()
{
var html =
@"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> ";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
htmlDoc.Save("test.html");
}
}
我有几个 Html 文件存储在同一目录(TestReport1.html、TestReport2.html...),内容如下:
<!DOCTYPE html>
<html>
<body>
<section class='summary'>
<ul class='resultSummary'>
<li class='Passed'>
<div class='summaryLine'>
<div class='summaryLabel'>Passed</div>
<span class='summaryCount'>199</span>
</div>
<input type='checkbox' class='cbx_toggle' unchecked/>
</li>
<li class='Inconclusive'>
<div class='summaryLine'>
<div class='summaryLabel'>Inconclusive</div>
<span class='summaryCount'>10</span>
</div>
<input type='checkbox' class='cbx_toggle' unchecked/>
</li>
<li class='NotImplemented'>
<div class='summaryLine'>
<div class='summaryLabel'>Not Implemented</div>
<span class='summaryCount'>5</span>
</div>
<input type='checkbox' class='cbx_toggle' unchecked/>
</li>
<li class='Failed'>
<div class='summaryLine'>
<div class='summaryLabel'>Failed</div>
<span class='summaryCount'>12</span>
</div>
<input type='checkbox' class='cbx_toggle' checked/>
</li>
<li id='summaryChart'</li>
</ul>
</section>
</body>
</html>
我想解析每个 html 文件并从每个节点列表中获取值及其相关的存储值到类似这样的输出中:
测试报告1: 通过:199 Inconclusive:10 未实施:5 失败:12
测试报告2: 通过:20 Inconclusive:10 未实施:50 失败:120
然后我想将所有结果合并到一个 html 摘要文件中:
总结测试报告: 通过总数:199 总计 Inconclusive:10 未实施总数:5 失败总数:12
如有任何提示和想法,我们将不胜感激
要获取值,您可以使用 XPath。示例:
"//*[@class='Inconclusive']/div/span"
C# HtmlAgilityPack
var html = new HtmlDocument();
html.LoadHtml(<html code>);
var xpath = "//*[@class='Inconclusive']/div/span";
var parse = html.DocumentNode.SelectSingleNode(xpath).InnerText;
如何获取 XPath:
// Description: HAP - Load (From File)
// Website: https://html-agility-pack.net/
// Run: https://dotnetfiddle.net/EsvZyg
// @nuget: HtmlAgilityPack
using System;
using System.Xml;
using HtmlAgilityPack;
public class Program
{
public static void Main()
{
SaveHtmlFile();
#region example
var path = @"test.html";
var doc = new HtmlDocument();
doc.Load(path);
var node = doc.DocumentNode.SelectSingleNode("//body");
Console.WriteLine(node.OuterHtml);
#endregion
}
private static void SaveHtmlFile()
{
var html =
@"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> ";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
htmlDoc.Save("test.html");
}
}