如何删除超过出现次数的重复 br 标签?

How to remove duplicate br tag that exceeds a number of occurences?

我希望每个段落保留不超过 2 <br>

string html = @"paragraph 1 a dkahdk ahkdhadk.<br><br><br>
<br>
paragraph 2  adshkad hkasdhkasdh.<br>
<br>
paragraph 3 akdash dkjahiewry iwery.<br>
<br><br>
paragraph 4 ljsdlfjsldfj.<br>
<br>
<br>
<br>";    

HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();

doc.LoadHtml(html);
var xpath = "//text()[not(normalize-space())]";
var emptyNodes = doc.DocumentNode.SelectNodes(xpath);
foreach (HtmlNode emptyNode in emptyNodes)
{
    emptyNode.Remove(); // remove  \r\n
}
var nodes = doc.DocumentNode.SelectNodes("//br[following-sibling::br[3]]").ToList();
foreach (var node in nodes)
{
    node.Remove();
}

输出以某种方式删除了所有 br。正确的输出应该是

paragraph 1 a dkahdk ahkdhadk.<br><br>
paragraph 2  adshkad hkasdhkasdh.<br><br>
paragraph 3 akdash dkjahiewry iwery.<br><br>
paragraph 4 ljsdlfjsldfj.<br><br>   

一个简单的正则表达式替换就足够了,而不是使用 HtmlAgilityPack。例如,使用多步过程:

//use regex to find <br>, <br > or <br /> tags:
//var toNewLines = new Regex( @"<br\s?/?>" );
//var onlyNewLines = toNewLines.Replace(html, Environment.NewLine);
//or, since all br tags are <br>:
var onlyNewLines = html.Replace("<br>", Environment.NewLine);

var regex = new Regex( @"([" + Environment.NewLine + "\t])+" );

var result = regex.Replace(onlyNewLines, Environment.NewLine);

var finalResult = result.Replace(Environment.NewLine, "<br /><br />" + Environment.NewLine);