C# 垃圾字符破坏 XElement "pretty" 表示

Question

我偶尔运行跨越 XML 元素之间有一些垃圾字符，这似乎混淆了任何处理美化元素的内部 XNode/XElement 方法。

以下...

var badNode = XElement.Parse(@"<b>+
  <inner1/>
  <inner2/>
</b>"

打印出来

<b>+
  <inner1 /><inner2 /></b>

虽然这...

var badNode = XElement.Parse(@"<b>
  <inner1/>
  <inner2/>
</b>"

给出预期

<b>
  <inner1 />
  <inner2 />
</b>

根据调试器，垃圾字符被解析为 XElement 的 "NextNode" 属性，然后显然将剩余的 XML 分配为 its "NextNode",造成单行美化

除了预先筛选 XML 标签标记之间的任何错误字符外，有什么方法可以 prevent/ignore 这种行为吗？

Answer 1

badNode 的缩进很尴尬，因为通过将非白色 space + 字符添加到  元素值中，该元素现在包含 mixed content，W3C定义如下：

3.2.2 Mixed Content

[Definition: An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements.]

元素中混合内容的存在会触发 XmlWriter (which is used internally by XElement.ToString() to actually write itself to an XML string) that are explained in the documentation remarks for XmlWriterSettings.Indent 的特殊格式规则：

This property only applies to XmlWriter instances that output text content; otherwise, this setting is ignored.

The elements are indented as long as the element does not contain mixed content. Once the WriteString or WriteWhitespace method is called to write out a mixed element content, the XmlWriter stops indenting. The indenting resumes once the mixed content element is closed.

这解释了您所看到的行为。

作为变通方法，parsing your XML with LoadOptions.PreserveWhitespace 在解析时 保留了无关紧要的白色 space，这可能就是您想要的：

var badNode = XElement.Parse(@"+ <inner1/> <inner2/> ", LoadOptions.PreserveWhitespace); Console.WriteLine(badNode);

输出：

+ <inner1 /> <inner2 /> 

演示 fiddle #1 here.

或者，如果您确定 badNode 不应该有字符数据，您可以在解析后手动删除它：

badNode.Nodes().OfType<XText>().Remove();

现在 badNode 将不再包含混合内容并且 XmlWriter 会很好地缩进它。

演示 fiddle #2 here.

C# 垃圾字符破坏 XElement "pretty" 表示

C# junk characters break XElement "pretty" representation

c#

xml

xelement