C# 垃圾字符破坏 XElement "pretty" 表示

C# junk characters break XElement "pretty" representation

我偶尔 运行 跨越 XML 元素之间有一些垃圾字符,这似乎混淆了任何处理美化元素的内部 XNode/XElement 方法。

以下...

var badNode = XElement.Parse(@"<b>+
  <inner1/>
  <inner2/>
</b>"

打印出来

<b>+
  <inner1 /><inner2 /></b>

虽然这...

var badNode = XElement.Parse(@"<b>
  <inner1/>
  <inner2/>
</b>"

给出预期

<b>
  <inner1 />
  <inner2 />
</b>

根据调试器,垃圾字符被解析为 XElement 的 "NextNode" 属性,然后显然将剩余的 XML 分配为 its "NextNode",造成单行美化

除了预先筛选 XML 标签标记之间的任何错误字符外,有什么方法可以 prevent/ignore 这种行为吗?

badNode 的缩进很尴尬,因为通过将非白色 space + 字符添加到 <b> 元素值中,该元素现在包含 mixed content,W3C定义如下:

3.2.2 Mixed Content

[Definition: An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements.]

元素中混合内容的存在会触发 XmlWriter (which is used internally by XElement.ToString() to actually write itself to an XML string) that are explained in the documentation remarks for XmlWriterSettings.Indent 的特殊格式规则:

This property only applies to XmlWriter instances that output text content; otherwise, this setting is ignored.

The elements are indented as long as the element does not contain mixed content. Once the WriteString or WriteWhitespace method is called to write out a mixed element content, the XmlWriter stops indenting. The indenting resumes once the mixed content element is closed.

这解释了您所看到的行为。

作为变通方法,parsing your XML with LoadOptions.PreserveWhitespace 在解析 时 保留了无关紧要的白色 space,这可能就是您想要的:

var badNode = XElement.Parse(@"<b>+
  <inner1/>
  <inner2/>
</b>",          
                             LoadOptions.PreserveWhitespace);
Console.WriteLine(badNode);

输出:

<b>+
  <inner1 />
  <inner2 />
</b>

演示 fiddle #1 here.

或者,如果您确定 badNode 不应该有字符数据,您可以在解析后手动删除它:

badNode.Nodes().OfType<XText>().Remove();

现在 badNode 将不再包含混合内容并且 XmlWriter 会很好地缩进它。

演示 fiddle #2 here.