在 html 字符串中拼接 html 标签

splice html tags in html string

我正在使用 htmlagility pack 删除开始和结束位置的 <br> 标签,但下面的代码正在从所有位置删除。

HTML 字符串:

 <p><br><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span><br></p>

下面是我删除 br 标签的代码

    using HtmlAgilityPack;

    var document = new HtmlAgilityPack.HtmlDocument();
    document.LoadHtml(input.HTMLString);
    var rootNode = document.DocumentNode;
    var nodes = rootNode.SelectNodes("//br");
    if (nodes != null)
    {
        foreach (var brTag in nodes)
            brTag.Remove();
        this.HTMLString = document.DocumentNode.OuterHtml;
    }

我希望结果字符串看起来像这样

 <p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span></p>

而是像下面这样在 this.HTMLString 中获取字符串

  <p><span>MERV 9 Cartridge<b>&nbsp;</b>Prefilters </span></p>

任何人都可以帮助如何仅在字符串的开头和结尾而不是在字符串之间删除 br 标记,我正在使用 HTMLAgility pack 库

我不确定您的 HTML 是否始终在 <p> 元素内,或者 <br /> 元素的数量是否因情况而异。如果它没有不同并且你可以依赖外部元素相同,你可以使用它来获取第一个和最后一个 <br/> 元素。

选项 #1 - 当父元素(在本例中为 p)已知并且 br 元素的数量已知(在本例中为 3 ).

string html = "<p><br><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span><br></p>";
string outHtml = string.Empty;

var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);
var rootNode = document.DocumentNode;
var firstBrNode = rootNode.SelectSingleNode("//p/br[1]");
var lastBrNode = rootNode.SelectSingleNode("//p/br[last()]");

firstBrNode?.Remove();
lastBrNode?.Remove();
outHtml = document.DocumentNode.OuterHtml;

输出:

<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span></p>


选项 #2 - 当父元素未知且 br 标签的数量未知时,假设只有一个 br 元素存在它将保留在 HTML.

string html = "<p><br><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span><br></p>";
// string html = "<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span></p>";
string outHtml = string.Empty;
var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);
var rootNode = document.DocumentNode;
// count all br nodes so we can bypass removal of br if there is only one in HTML
var brNodeCount = rootNode.SelectNodes("//br") == null ? 0 : rootNode.SelectNodes("//br").Count;
// get the parent node of the br element to be used in the xpath when we remove
// the br elements this will allow for different parent elements other than the `p` element
var parentNode = rootNode.SelectSingleNode("//br/parent::*");
// only removes br elements if more than one in HTML, assumes if 1 br element is present it's in the middle and will not be removed
if (brNodeCount > 1)
{ 
    var firstBrNode = rootNode.SelectSingleNode($"//{parentNode.Name}/br[1]");
    var lastBrNode = rootNode.SelectSingleNode($"//{parentNode.Name}/br[last()]");
    firstBrNode?.Remove();
    lastBrNode?.Remove();
}
outHtml = document.DocumentNode.OuterHtml;

输出:

<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span></p>


选项 #3 - 考虑第一个和最后一个文本节点的索引,并删除位于 'outside' 中的所有 br 元素。包含空值或全白-space 值的文本节点将被忽略。

// removes all br tags with an index before the first text node and
// all br tags with an index after the end of the last text node,
// any br tags between are not removed
private string RemoveStartAndEndBrTags(string html)
{
    if (string.IsNullOrEmpty(html)) return html;
    var document = new HtmlAgilityPack.HtmlDocument();
    document.LoadHtml(html);
    var rootNode = document.DocumentNode;
    // get first and last text nodes, excluding any only containing white-space
    var allNonEmptyTextNodes = rootNode.SelectNodes("//text()[not(self::text()[not(normalize-space())])]");
    if (allNonEmptyTextNodes == null || allNonEmptyTextNodes.Count == 0) return html;
    var firstTextNode = allNonEmptyTextNodes[0];
    var lastTextNode = allNonEmptyTextNodes[allNonEmptyTextNodes.Count - 1];
    // get the parent node of the first br element, it will be used when we remove the br elements,
    // this will allow for different parent elements other than the `p` element
    var parentNode = rootNode.SelectSingleNode("//br/parent::*");
    if (parentNode == null) return html;
    var allBrNodes = rootNode.SelectNodes($"//{parentNode.Name}/br");
    foreach (var brNode in allBrNodes)
    {
        if (brNode == null) continue;
        // check index of br nodes against first and last text nodes
        // and remove br nodes that sit outside text nodes
        if (brNode.OuterStartIndex <= firstTextNode.OuterStartIndex
            || brNode.OuterStartIndex >= lastTextNode.OuterStartIndex + lastTextNode.OuterLength)
        { 
            brNode.Remove();
        }
    }
    return document.DocumentNode.OuterHtml;
}

测试HTML输入:

<p><br><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span><br></p>
<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span></p>
<p><span>MERV 9 <br>Cartridge<b><br>&nbsp;</b>Prefilters </span></p>
<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters<br> </span></p>
<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters<br></span></p>

测试HTML输出:

<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span></p>
<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span></p>
<p><span>MERV 9 <br>Cartridge<b><br>&nbsp;</b>Prefilters </span></p>
<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters </span></p
<p><span>MERV 9 Cartridge<b><br>&nbsp;</b>Prefilters</span></p>