要序列化的 属性 是否有大小限制?

Is there size limit for a property to be serialized?

我正在处理一个需要 XML 文档的界面。到目前为止,我已经能够使用 XmlSerializer 序列化大多数对象。但是,有一个 属性 被证明是有问题的。它应该是包装文档的对象集合。文档本身被编码为 base64 字符串。

基本结构是这样的:

//snipped out of a parent object
public List<Document> DocumentCollection { get; set; }
//end snip

public class Document
    {
        public string DocumentTitle { get; set; }
        public Code DocumentCategory { get; set; }
        /// <summary>
        /// Base64 encoded file
        /// </summary>
        public string BinaryDocument { get; set; }
        public string DocumentTypeText { get; set; }
    }

问题是较小的值可以正常工作,但如果文档太大,序列化程序就会跳过集合中的该文档项。

我遇到了一些限制吗?

更新:我改了

public string BinaryDocument { get; set; }

public byte[] BinaryDocument { get; set; }

我仍然得到相同的结果。较小的文档 (~150kb) 序列化得很好,但其余的则不然。需要明确的是,这不仅仅是 属性 的值,而是整个包含的 Document 对象都被删除了。

更新 2:

这是带有简单重现的序列化代码。它来自我放在一起的控制台项目。问题是这段代码在测试项目中运行良好。我很难将完整的对象结构打包到这里,因为填充字段的复杂性几乎不可能在测试用例中使用实际对象,所以我试图减少主应用程序中的代码。填充对象进入序列化代码,DocumentCollection 填充了四个 Document,出来一个 Document。

using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Serialization;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            var container = new DocumentContainer();
            var docs = new List<Document>();
            foreach (var f in Directory.GetFiles(@"E:\Software Projects\DA\Test Documents"))
            {
                var fileStream = new MemoryStream(File.ReadAllBytes(f));
                var doc = new Document
                {
                    BinaryDocument = fileStream.ToArray(),
                    DocumentTitle = Path.GetFileName(f)
                };

                docs.Add(doc);
            }

            container.DocumentCollection = docs;

            var serializer = new XmlSerializer(typeof(DocumentContainer));
            var ms = new MemoryStream();
            var writer = XmlWriter.Create(ms);

            serializer.Serialize(writer, container);
            writer.Flush();
            ms.Seek(0, SeekOrigin.Begin);

            var reader = new StreamReader(ms, Encoding.UTF8);
            File.WriteAllText(@"C:\temp\testexport.xml", reader.ReadToEnd());
        }
    }

    public class Document
    {
        public string DocumentTitle { get; set; }
        public byte[] BinaryDocument { get; set; }
    }

    // test class
    public class DocumentContainer
    {
        public List<Document> DocumentCollection { get; set; }
    }
}

XmlSerializer 对它可以序列化的字符串的长度没有限制。

.Net,但是,有一个 maximum string length of int.MaxValue. Furthermore, since internally a string is implemented as a contiguous memory buffer, on a 32 bit process you're likely to be unable to allocate a string anywhere near that large due to process space fragmentation. And since a c# base64 string requires roughly 2.67 times the memory of the byte [] array from which it was created (1.33 for the encoding 乘以 2 因为 .Net char 类型实际上是两个字节)你可能会得到一个 OutOfMemoryException 编码一个大的二进制文档作为一个完整的 base64 字符串,然后吞下并忽略它,留下 BinaryDocument 属性 null.

也就是说,您没有理由手动将二进制文档编码为 base64,因为 XmlSerializer 会自动为您完成此操作。 IE。如果我序列化以下 class:

public class Document
{
    public string DocumentTitle { get; set; }
    public Code DocumentCategory { get; set; }
    public byte [] BinaryDocument { get; set; }
    public string DocumentTypeText { get; set; }
}

我得到以下 XML:

<Document>
  <DocumentTitle>my title</DocumentTitle>
  <DocumentCategory>Default</DocumentCategory>
  <BinaryDocument>AAECAwQFBgcICQoLDA0ODxAREhM=</BinaryDocument>
  <DocumentTypeText>document text type</DocumentTypeText>
</Document>

如您所见,BinaryDocument 是 base64 编码的。因此,您应该能够将二进制文档保存在更紧凑的 byte [] 表示中,并且仍然可以获得您想要的 XML 输出。

更好的是,在幕后,XmlWriter 使用 System.Xml.Base64Encoder 来做到这一点。 class 以块的形式对其输入进行编码,从而避免了上述过度使用内存和潜在的内存不足异常。

我无法重现您遇到的问题。即使单个文件大到 267 MB 到 1.92 GB,我也没有看到任何元素被跳过。我看到的唯一问题是临时 var ms = new MemoryStream(); 最终超过了它的 2 GB 缓冲区限制,于是抛出异常。我将其替换为直接流,问题消失了:

using (var stream = File.Open(outputPath, FileMode.Create, FileAccess.ReadWrite))

话虽这么说,您的设计最终将 运行 遇到足够多的足够大文件的内存限制,因为您在序列化之前将所有文件加载到内存中。如果发生这种情况,您可能会在生产代码的某处捕获并吞下 OutOfMemoryException 而没有意识到,从而导致您看到的问题。

作为替代方案,我会建议一个流式解决方案,您可以通过使 Document class 从 XmlSerializer 中增量复制每个文件的内容到 XML 输出实施 IXmlSerializable:

public class Document : IXmlSerializable
{
    public string DocumentPath { get; set; }

    public string DocumentTitle
    {
        get
        {
            if (DocumentPath == null)
                return null;
            return Path.GetFileName(DocumentPath);
        }
    }

    const string DocumentTitleName = "DocumentTitle";
    const string BinaryDocumentName = "BinaryDocument";

    #region IXmlSerializable Members

    System.Xml.Schema.XmlSchema IXmlSerializable.GetSchema()
    {
        return null;
    }

    void ReadXmlElement(XmlReader reader)
    {
        if (reader.Name == DocumentTitleName)
            DocumentPath = reader.ReadElementContentAsString();
    }

    void IXmlSerializable.ReadXml(XmlReader reader)
    {
        reader.ReadXml(null, ReadXmlElement);
    }

    void IXmlSerializable.WriteXml(XmlWriter writer)
    {
        writer.WriteElementString(DocumentTitleName, DocumentTitle ?? "");
        if (DocumentPath != null)
        {
            try
            {
                using (var stream = File.OpenRead(DocumentPath))
                {
                    // Write the start element if the file was successfully opened
                    writer.WriteStartElement(BinaryDocumentName);
                    try
                    {
                        var buffer = new byte[6 * 1024];
                        int read;
                        while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
                            writer.WriteBase64(buffer, 0, read);
                    }
                    finally
                    {
                        // Write the end element even if an error occurred while streaming the file.
                        writer.WriteEndElement();
                    }
                }
            }
            catch (Exception ex)
            {
                // You could log the exception as an element or as a comment, as you prefer.
                // Log as a comment
                writer.WriteComment("Caught exception with message: " + ex.Message);
                writer.WriteComment("Exception details:");
                writer.WriteComment(ex.ToString());
                // Log as an element.
                writer.WriteElementString("ExceptionMessage", ex.Message);
                writer.WriteElementString("ExceptionDetails", ex.ToString());
            }
        }
    }

    #endregion
}

// test class
public class DocumentContainer
{
    public List<Document> DocumentCollection { get; set; }
}

public static class XmlSerializationExtensions
{
    public static void ReadXml(this XmlReader reader, Action<IList<XAttribute>> readXmlAttributes, Action<XmlReader> readXmlElement)
    {
        if (reader.NodeType != XmlNodeType.Element)
            throw new InvalidOperationException("reader.NodeType != XmlNodeType.Element");

        if (readXmlAttributes != null)
        {
            var attributes = new List<XAttribute>(reader.AttributeCount);
            while (reader.MoveToNextAttribute())
            {
                attributes.Add(new XAttribute(XName.Get(reader.Name, reader.NamespaceURI), reader.Value));
            }
            // Move the reader back to the element node.
            reader.MoveToElement();
            readXmlAttributes(attributes);
        }

        if (reader.IsEmptyElement)
        {
            reader.Read();
            return;
        }

        reader.ReadStartElement(); // Advance to the first sub element of the wrapper element.

        while (reader.NodeType != XmlNodeType.EndElement)
        {
            if (reader.NodeType != XmlNodeType.Element)
                // Comment, whitespace
                reader.Read();
            else
            {
                using (var subReader = reader.ReadSubtree())
                {
                    while (subReader.NodeType != XmlNodeType.Element) // Read past XmlNodeType.None
                        if (!subReader.Read())
                            break;
                    if (readXmlElement != null)
                        readXmlElement(subReader);
                }
                reader.Read();
            }
        }

        // Move past the end of the wrapper element
        reader.ReadEndElement();
    }
}

然后使用如下:

public static void SerializeFilesToXml(string directoryPath, string xmlPath)
{
    var docs = from file in Directory.GetFiles(directoryPath)
               select new Document { DocumentPath = file };
    var container = new DocumentContainer { DocumentCollection = docs.ToList() };

    using (var stream = File.Open(xmlPath, FileMode.Create, FileAccess.ReadWrite))
    using (var writer = XmlWriter.Create(stream, new XmlWriterSettings { Indent = true, IndentChars = " " }))
    {
        new XmlSerializer(container.GetType()).Serialize(writer, container);
    }

    Debug.WriteLine("Wrote " + xmlPath);
}

使用流解决方案,当序列化 4 个文件,每个文件大约 250 MB 时,我的内存使用增加了 0.8 MB。使用原来的 classes,我的内存增加了 1022 MB。

更新

如果您需要将 XML 写入内存流,请注意 c# MemoryStream has a hard maximum stream length of int.MaxValue (i.e. 2 GB) because the underlying memory is simply a byte array. On a 32-bit process the effective max length will be much smaller, see OutOfMemoryException while populating MemoryStream: 256MB allocation on 16GB system.

要以编程方式检查您的进程是否真的是 32 位,请参阅 How to determine programmatically whether a particular process is 32-bit or 64-bit. To change to 64 bit, see What is the purpose of the “Prefer 32-bit” setting in Visual Studio 2012 and how does it actually work?

如果您确定自己 运行 处于 64 位模式并且仍然超出 MemoryStream 的硬大小限制,也许请参阅 alternative to MemoryStream for large data volumes or MemoryStream replacement?