为非常大的 xml 文件估计 c# 中的迭代元素

Question

我正在处理大量不同的 xml 文件，但我不知道文件中的迭代元素。

我所说的迭代元素是指在整个 xml 文件中重复出现的元素（在 xsd-fiels 中也可以看到 maxOccurs="unbounded"）。

例如，订单文件可能包含一个名为 order

的重复元素

我收到的一些结构示例是

<order>
   <order>...</order>
   <order>...</order>
</orders>

<products>
   <product>...</product>
   <product>...</product>
</products>

<root>
   <element>...</element>
   <element>...</element>
</root>

<products>
   <section>
    <someelement>content</someelement>
    <item>...</item>
    <item>...</item>
    <item>...</item>
    <item>...</item>
   </section>
</products>

在上面的例子中，iterators/repeaters 被调用：

orders > order
products > product
root > element
products > section > item

我估计迭代器的常用方法是将完整的 xml 文件从生成和 xsd 模式加载到 xml 文档中，并从中找到第一个 maxOccurs 和子元素在里面。这工作正常，但使用 xmldocument 不适用于非常大的 xml 文件（gb 大小）。

为此，我需要使用 xmlreader，但我不知道如何使用 xmlreader 来估计迭代器，因为我无法使用 xsd 把戏。

因此寻求有关如何估算它的意见，任何想法都将受到赞赏

Answer 1

尝试使用以下代码将结果放入字典

using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;


namespace ConsoleApplication75
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            Node.ParseChildren(FILENAME);
        }


    }
    public class Node
    {
        public static XmlReader reader;
        public static Dictionary<string, int> dict = new Dictionary<string, int>();

        public static void ParseChildren(string filename)
        {
            reader = XmlReader.Create(filename);
            reader.MoveToContent();
            string name = "";
            reader.ReadStartElement();
            ParseChildrenRecursive(name);
        }

        public static void ParseChildrenRecursive(string path)
        {
            while (!reader.EOF)
            {
                if (reader.NodeType == XmlNodeType.EndElement)
                {
                    reader.ReadEndElement();
                    break;
                }
                if (reader.IsStartElement())
                {
                    string childName = reader.LocalName;
                    string newPath = path + " > " + childName;
                    if(dict.ContainsKey(newPath))
                    {
                        dict[newPath] += 1;
                    }
                    else
                    {
                        dict.Add(newPath, 1);
                    }
                    reader.ReadStartElement();
                    ParseChildrenRecursive(newPath);
                }
                if ((reader.NodeType != XmlNodeType.StartElement) && (reader.NodeType != XmlNodeType.EndElement))
                   reader.Read();
            }
        }
    }

}

为非常大的 xml 文件估计 c# 中的迭代元素

Estimating the iteration element in c# for very large xml files

c#

xml

xsd

xmldocument

xmlreader