如何根据 IndexOf 设置 MemoryStream 位置,拆分一系列 XML 文档?

How to set MemoryStream position based on IndexOf, to split apart a sequence of XML documents?

我有一个伪 XML 文件,里面有 5 个小的 xml,如下所示:

我想要实现的是单独的,并使用 MemoryStream 和以下代码为每个 XML 创建一个新文件:

int flag = 0;

byte[] arr = Encoding.ASCII.GetBytes(File.ReadAllText(@"C:\Users\Aleksa\Desktop\testTxt.xml"));

for (int i = 0; i <= 5; i++)
{
    MemoryStream mem = new MemoryStream(arr);
    mem.Position = flag;
    StreamReader rdr = new StreamReader(mem);

    string st = rdr.ReadToEnd();

    if (st.IndexOf("<TestNode") != -1 && (st.IndexOf("</TestNode>") != -1 || st.IndexOf("/>") != -1))
    {
        int curr = st.IndexOf("<TestNode");
        int end = st.IndexOf("\r");
        string toWrite = st.Substring(st.IndexOf("<TestNode"), end);
        File.WriteAllText(@"C:\Users\Aleksa\Desktop\" + i.ToString() + ".xml", toWrite);
        flag += end;
    }
    Console.WriteLine(st);
}

图像中的第一个 XML 被分开并且没问题,其余的都是空文件,在调试时我注意到即使我将 position 设置为 end变量它仍然从顶部流出,而且第一次之后的所有迭代都使结束变量等于零!

我已经尝试将 IndexOf 参数更改为 </TestNode> + 11,这与上面的代码相同,只是其余文件不为空但不完整,只剩下 <TestNode a。我怎样才能修复这里的逻辑并将我的 XML 文档流分开?

您的输入流由 -- i.e. a series of XML root elements 串联而成。

您可以使用 XmlReaderSettings.ConformanceLevel == ConformanceLevel.Fragment. From the docs 创建的 XmlReader 读取此类流:

Fragment

Ensures that the XML data conforms to the rules for a well-formed XML 1.0 document fragment.

This setting accepts XML data with multiple root elements, or text nodes at the top-level.

以下扩展方法可用于此任务:

public static class XmlReaderExtensions
{
    public static IEnumerable<XmlReader> ReadRoots(this XmlReader reader)
    {
        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element)
            {
                using (var subReader = reader.ReadSubtree())
                    yield return subReader;
            }
        }
    }

    public static void SplitDocumentFragments(Stream stream, Func<int, string> makeFileName, Action<string, IXmlLineInfo> onFileWriting, Action<string, IXmlLineInfo> onFileWritten)
    {
        using (var textReader = new StreamReader(stream, Encoding.UTF8, true, 4096, true))
        {
            SplitDocumentFragments(textReader, makeFileName, onFileWriting, onFileWritten);
        }
    }

    public static void SplitDocumentFragments(TextReader textReader, Func<int, string> makeFileName, Action<string, IXmlLineInfo> onFileWriting, Action<string, IXmlLineInfo> onFileWritten)
    {
        if (textReader == null || makeFileName == null)
            throw new ArgumentNullException();
        var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment, CloseInput = false };
        using (var xmlReader = XmlReader.Create(textReader, settings))
        {
            var lineInfo = xmlReader as IXmlLineInfo;
            var index = 0;

            foreach (var reader in xmlReader.ReadRoots())
            {
                var outputName = makeFileName(index);
                reader.MoveToContent();
                if (onFileWriting != null)
                    onFileWriting(outputName, lineInfo);
                using(var writer = XmlWriter.Create(outputName))
                {
                    writer.WriteNode(reader, true);
                }
                index++;
                if (onFileWritten != null)
                    onFileWritten(outputName, lineInfo);
            }
        }
    }
}

那么你将按如下方式使用它:

var fileName = @"C:\Users\Aleksa\Desktop\testTxt.xml";
var outputPath = ""; // The directory in which to create your XML files.
using (var stream = File.OpenRead(fileName))
{
    XmlReaderExtensions.SplitDocumentFragments(stream,
                                               index => Path.Combine(outputPath, index.ToString() + ".xml"),
                                               (name, lineInfo) => 
                                               {
                                                   Console.WriteLine("Writing {0}, starting line info: LineNumber = {1}, LinePosition = {2}...", 
                                                                     name, lineInfo?.LineNumber, lineInfo?.LinePosition);
                                               },
                                               (name, lineInfo) => 
                                               {
                                                   Console.WriteLine("   Done.  Result: ");
                                                   Console.Write("   ");
                                                   Console.WriteLine(File.ReadAllText(name));
                                               });
}

输出类似于:

Writing 0.xml, starting line info: LineNumber = 1, LinePosition = 2...
   Done.  Result: 
   <?xml version="1.0" encoding="utf-8"?><TestNode active="1" lastName="l"><Foo /> </TestNode>
Writing 1.xml, starting line info: LineNumber = 2, LinePosition = 2...
   Done.  Result: 
   <?xml version="1.0" encoding="utf-8"?><TestNode active="2" lastName="l" />
Writing 2.xml, starting line info: LineNumber = 3, LinePosition = 2...
   Done.  Result: 
   <?xml version="1.0" encoding="utf-8"?><TestNode active="3" lastName="l"><Foo />  </TestNode>

... (others omitted).

备注:

  • 方法 ReadRoots() 读取 XML 片段流的所有根元素 returns 嵌套 reader 仅限于特定的根, 通过使用 XmlReader.ReadSubtree():

    Returns a new XmlReader instance that can be used to read the current node, and all its descendants. ... When the new XML reader has been closed, the original reader is positioned on the EndElement node of the sub-tree.

    这允许该方法的调用者单独解析每个根,而不必担心读过根的末尾并进入下一个根。然后可以使用 XmlWriter.WriteNode(XmlReader, true).

  • 将每个根节点的内容复制到输出 XmlWriter
  • 您可以使用 IXmlLineInfo 接口跟踪文件中的 近似 位置,该接口由解析文本流的 XmlReader 子类实现.如果您的文档片段流由于某种原因被截断,这可以帮助确定错误发生的位置。

    详见:getting the current position from an XmlReader and

  • 如果您要解析包含 XML 片段的 string st 而不是直接从文件中读取,则可以将 StringReader 传递给 SplitDocumentFragments():

    using (var textReader = new StringReader(st))
    {
            XmlReaderExtensions.SplitDocumentFragments(textReader, 
    // Remainder as before
    
  • 不要使用 Encoding.ASCII 读取 XML 流,这将从文件中删除所有非英语字符。相反,使用 Encoding.UTF8 and/or 从 BOM 或 XML 声明中检测编码。

演示 fiddle here.