要序列化的 属性 是否有大小限制?
Is there size limit for a property to be serialized?
我正在处理一个需要 XML 文档的界面。到目前为止,我已经能够使用 XmlSerializer 序列化大多数对象。但是,有一个 属性 被证明是有问题的。它应该是包装文档的对象集合。文档本身被编码为 base64 字符串。
基本结构是这样的:
//snipped out of a parent object
public List<Document> DocumentCollection { get; set; }
//end snip
public class Document
{
public string DocumentTitle { get; set; }
public Code DocumentCategory { get; set; }
/// <summary>
/// Base64 encoded file
/// </summary>
public string BinaryDocument { get; set; }
public string DocumentTypeText { get; set; }
}
问题是较小的值可以正常工作,但如果文档太大,序列化程序就会跳过集合中的该文档项。
我遇到了一些限制吗?
更新:我改了
public string BinaryDocument { get; set; }
到
public byte[] BinaryDocument { get; set; }
我仍然得到相同的结果。较小的文档 (~150kb) 序列化得很好,但其余的则不然。需要明确的是,这不仅仅是 属性 的值,而是整个包含的 Document 对象都被删除了。
更新 2:
这是带有简单重现的序列化代码。它来自我放在一起的控制台项目。问题是这段代码在测试项目中运行良好。我很难将完整的对象结构打包到这里,因为填充字段的复杂性几乎不可能在测试用例中使用实际对象,所以我试图减少主应用程序中的代码。填充对象进入序列化代码,DocumentCollection 填充了四个 Document,出来一个 Document。
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Serialization;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var container = new DocumentContainer();
var docs = new List<Document>();
foreach (var f in Directory.GetFiles(@"E:\Software Projects\DA\Test Documents"))
{
var fileStream = new MemoryStream(File.ReadAllBytes(f));
var doc = new Document
{
BinaryDocument = fileStream.ToArray(),
DocumentTitle = Path.GetFileName(f)
};
docs.Add(doc);
}
container.DocumentCollection = docs;
var serializer = new XmlSerializer(typeof(DocumentContainer));
var ms = new MemoryStream();
var writer = XmlWriter.Create(ms);
serializer.Serialize(writer, container);
writer.Flush();
ms.Seek(0, SeekOrigin.Begin);
var reader = new StreamReader(ms, Encoding.UTF8);
File.WriteAllText(@"C:\temp\testexport.xml", reader.ReadToEnd());
}
}
public class Document
{
public string DocumentTitle { get; set; }
public byte[] BinaryDocument { get; set; }
}
// test class
public class DocumentContainer
{
public List<Document> DocumentCollection { get; set; }
}
}
XmlSerializer
对它可以序列化的字符串的长度没有限制。
.Net,但是,有一个 maximum string length of int.MaxValue
. Furthermore, since internally a string is implemented as a contiguous memory buffer, on a 32 bit process you're likely to be unable to allocate a string anywhere near that large due to process space fragmentation. And since a c# base64 string requires roughly 2.67 times the memory of the byte []
array from which it was created (1.33 for the encoding 乘以 2 因为 .Net char
类型实际上是两个字节)你可能会得到一个 OutOfMemoryException
编码一个大的二进制文档作为一个完整的 base64 字符串,然后吞下并忽略它,留下 BinaryDocument
属性 null
.
也就是说,您没有理由手动将二进制文档编码为 base64,因为 XmlSerializer
会自动为您完成此操作。 IE。如果我序列化以下 class:
public class Document
{
public string DocumentTitle { get; set; }
public Code DocumentCategory { get; set; }
public byte [] BinaryDocument { get; set; }
public string DocumentTypeText { get; set; }
}
我得到以下 XML:
<Document>
<DocumentTitle>my title</DocumentTitle>
<DocumentCategory>Default</DocumentCategory>
<BinaryDocument>AAECAwQFBgcICQoLDA0ODxAREhM=</BinaryDocument>
<DocumentTypeText>document text type</DocumentTypeText>
</Document>
如您所见,BinaryDocument
是 base64 编码的。因此,您应该能够将二进制文档保存在更紧凑的 byte []
表示中,并且仍然可以获得您想要的 XML 输出。
更好的是,在幕后,XmlWriter
使用 System.Xml.Base64Encoder
来做到这一点。 class 以块的形式对其输入进行编码,从而避免了上述过度使用内存和潜在的内存不足异常。
我无法重现您遇到的问题。即使单个文件大到 267 MB 到 1.92 GB,我也没有看到任何元素被跳过。我看到的唯一问题是临时 var ms = new MemoryStream();
最终超过了它的 2 GB 缓冲区限制,于是抛出异常。我将其替换为直接流,问题消失了:
using (var stream = File.Open(outputPath, FileMode.Create, FileAccess.ReadWrite))
话虽这么说,您的设计最终将 运行 遇到足够多的足够大文件的内存限制,因为您在序列化之前将所有文件加载到内存中。如果发生这种情况,您可能会在生产代码的某处捕获并吞下 OutOfMemoryException
而没有意识到,从而导致您看到的问题。
作为替代方案,我会建议一个流式解决方案,您可以通过使 Document
class 从 XmlSerializer
中增量复制每个文件的内容到 XML 输出实施 IXmlSerializable
:
public class Document : IXmlSerializable
{
public string DocumentPath { get; set; }
public string DocumentTitle
{
get
{
if (DocumentPath == null)
return null;
return Path.GetFileName(DocumentPath);
}
}
const string DocumentTitleName = "DocumentTitle";
const string BinaryDocumentName = "BinaryDocument";
#region IXmlSerializable Members
System.Xml.Schema.XmlSchema IXmlSerializable.GetSchema()
{
return null;
}
void ReadXmlElement(XmlReader reader)
{
if (reader.Name == DocumentTitleName)
DocumentPath = reader.ReadElementContentAsString();
}
void IXmlSerializable.ReadXml(XmlReader reader)
{
reader.ReadXml(null, ReadXmlElement);
}
void IXmlSerializable.WriteXml(XmlWriter writer)
{
writer.WriteElementString(DocumentTitleName, DocumentTitle ?? "");
if (DocumentPath != null)
{
try
{
using (var stream = File.OpenRead(DocumentPath))
{
// Write the start element if the file was successfully opened
writer.WriteStartElement(BinaryDocumentName);
try
{
var buffer = new byte[6 * 1024];
int read;
while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
writer.WriteBase64(buffer, 0, read);
}
finally
{
// Write the end element even if an error occurred while streaming the file.
writer.WriteEndElement();
}
}
}
catch (Exception ex)
{
// You could log the exception as an element or as a comment, as you prefer.
// Log as a comment
writer.WriteComment("Caught exception with message: " + ex.Message);
writer.WriteComment("Exception details:");
writer.WriteComment(ex.ToString());
// Log as an element.
writer.WriteElementString("ExceptionMessage", ex.Message);
writer.WriteElementString("ExceptionDetails", ex.ToString());
}
}
}
#endregion
}
// test class
public class DocumentContainer
{
public List<Document> DocumentCollection { get; set; }
}
public static class XmlSerializationExtensions
{
public static void ReadXml(this XmlReader reader, Action<IList<XAttribute>> readXmlAttributes, Action<XmlReader> readXmlElement)
{
if (reader.NodeType != XmlNodeType.Element)
throw new InvalidOperationException("reader.NodeType != XmlNodeType.Element");
if (readXmlAttributes != null)
{
var attributes = new List<XAttribute>(reader.AttributeCount);
while (reader.MoveToNextAttribute())
{
attributes.Add(new XAttribute(XName.Get(reader.Name, reader.NamespaceURI), reader.Value));
}
// Move the reader back to the element node.
reader.MoveToElement();
readXmlAttributes(attributes);
}
if (reader.IsEmptyElement)
{
reader.Read();
return;
}
reader.ReadStartElement(); // Advance to the first sub element of the wrapper element.
while (reader.NodeType != XmlNodeType.EndElement)
{
if (reader.NodeType != XmlNodeType.Element)
// Comment, whitespace
reader.Read();
else
{
using (var subReader = reader.ReadSubtree())
{
while (subReader.NodeType != XmlNodeType.Element) // Read past XmlNodeType.None
if (!subReader.Read())
break;
if (readXmlElement != null)
readXmlElement(subReader);
}
reader.Read();
}
}
// Move past the end of the wrapper element
reader.ReadEndElement();
}
}
然后使用如下:
public static void SerializeFilesToXml(string directoryPath, string xmlPath)
{
var docs = from file in Directory.GetFiles(directoryPath)
select new Document { DocumentPath = file };
var container = new DocumentContainer { DocumentCollection = docs.ToList() };
using (var stream = File.Open(xmlPath, FileMode.Create, FileAccess.ReadWrite))
using (var writer = XmlWriter.Create(stream, new XmlWriterSettings { Indent = true, IndentChars = " " }))
{
new XmlSerializer(container.GetType()).Serialize(writer, container);
}
Debug.WriteLine("Wrote " + xmlPath);
}
使用流解决方案,当序列化 4 个文件,每个文件大约 250 MB 时,我的内存使用增加了 0.8 MB。使用原来的 classes,我的内存增加了 1022 MB。
更新
如果您需要将 XML 写入内存流,请注意 c# MemoryStream
has a hard maximum stream length of int.MaxValue
(i.e. 2 GB) because the underlying memory is simply a byte array. On a 32-bit process the effective max length will be much smaller, see OutOfMemoryException while populating MemoryStream: 256MB allocation on 16GB system.
要以编程方式检查您的进程是否真的是 32 位,请参阅 How to determine programmatically whether a particular process is 32-bit or 64-bit. To change to 64 bit, see What is the purpose of the “Prefer 32-bit” setting in Visual Studio 2012 and how does it actually work?。
如果您确定自己 运行 处于 64 位模式并且仍然超出 MemoryStream
的硬大小限制,也许请参阅 alternative to MemoryStream for large data volumes or MemoryStream replacement?。
我正在处理一个需要 XML 文档的界面。到目前为止,我已经能够使用 XmlSerializer 序列化大多数对象。但是,有一个 属性 被证明是有问题的。它应该是包装文档的对象集合。文档本身被编码为 base64 字符串。
基本结构是这样的:
//snipped out of a parent object
public List<Document> DocumentCollection { get; set; }
//end snip
public class Document
{
public string DocumentTitle { get; set; }
public Code DocumentCategory { get; set; }
/// <summary>
/// Base64 encoded file
/// </summary>
public string BinaryDocument { get; set; }
public string DocumentTypeText { get; set; }
}
问题是较小的值可以正常工作,但如果文档太大,序列化程序就会跳过集合中的该文档项。
我遇到了一些限制吗?
更新:我改了
public string BinaryDocument { get; set; }
到
public byte[] BinaryDocument { get; set; }
我仍然得到相同的结果。较小的文档 (~150kb) 序列化得很好,但其余的则不然。需要明确的是,这不仅仅是 属性 的值,而是整个包含的 Document 对象都被删除了。
更新 2:
这是带有简单重现的序列化代码。它来自我放在一起的控制台项目。问题是这段代码在测试项目中运行良好。我很难将完整的对象结构打包到这里,因为填充字段的复杂性几乎不可能在测试用例中使用实际对象,所以我试图减少主应用程序中的代码。填充对象进入序列化代码,DocumentCollection 填充了四个 Document,出来一个 Document。
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Serialization;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var container = new DocumentContainer();
var docs = new List<Document>();
foreach (var f in Directory.GetFiles(@"E:\Software Projects\DA\Test Documents"))
{
var fileStream = new MemoryStream(File.ReadAllBytes(f));
var doc = new Document
{
BinaryDocument = fileStream.ToArray(),
DocumentTitle = Path.GetFileName(f)
};
docs.Add(doc);
}
container.DocumentCollection = docs;
var serializer = new XmlSerializer(typeof(DocumentContainer));
var ms = new MemoryStream();
var writer = XmlWriter.Create(ms);
serializer.Serialize(writer, container);
writer.Flush();
ms.Seek(0, SeekOrigin.Begin);
var reader = new StreamReader(ms, Encoding.UTF8);
File.WriteAllText(@"C:\temp\testexport.xml", reader.ReadToEnd());
}
}
public class Document
{
public string DocumentTitle { get; set; }
public byte[] BinaryDocument { get; set; }
}
// test class
public class DocumentContainer
{
public List<Document> DocumentCollection { get; set; }
}
}
XmlSerializer
对它可以序列化的字符串的长度没有限制。
.Net,但是,有一个 maximum string length of int.MaxValue
. Furthermore, since internally a string is implemented as a contiguous memory buffer, on a 32 bit process you're likely to be unable to allocate a string anywhere near that large due to process space fragmentation. And since a c# base64 string requires roughly 2.67 times the memory of the byte []
array from which it was created (1.33 for the encoding 乘以 2 因为 .Net char
类型实际上是两个字节)你可能会得到一个 OutOfMemoryException
编码一个大的二进制文档作为一个完整的 base64 字符串,然后吞下并忽略它,留下 BinaryDocument
属性 null
.
也就是说,您没有理由手动将二进制文档编码为 base64,因为 XmlSerializer
会自动为您完成此操作。 IE。如果我序列化以下 class:
public class Document
{
public string DocumentTitle { get; set; }
public Code DocumentCategory { get; set; }
public byte [] BinaryDocument { get; set; }
public string DocumentTypeText { get; set; }
}
我得到以下 XML:
<Document> <DocumentTitle>my title</DocumentTitle> <DocumentCategory>Default</DocumentCategory> <BinaryDocument>AAECAwQFBgcICQoLDA0ODxAREhM=</BinaryDocument> <DocumentTypeText>document text type</DocumentTypeText> </Document>
如您所见,BinaryDocument
是 base64 编码的。因此,您应该能够将二进制文档保存在更紧凑的 byte []
表示中,并且仍然可以获得您想要的 XML 输出。
更好的是,在幕后,XmlWriter
使用 System.Xml.Base64Encoder
来做到这一点。 class 以块的形式对其输入进行编码,从而避免了上述过度使用内存和潜在的内存不足异常。
我无法重现您遇到的问题。即使单个文件大到 267 MB 到 1.92 GB,我也没有看到任何元素被跳过。我看到的唯一问题是临时 var ms = new MemoryStream();
最终超过了它的 2 GB 缓冲区限制,于是抛出异常。我将其替换为直接流,问题消失了:
using (var stream = File.Open(outputPath, FileMode.Create, FileAccess.ReadWrite))
话虽这么说,您的设计最终将 运行 遇到足够多的足够大文件的内存限制,因为您在序列化之前将所有文件加载到内存中。如果发生这种情况,您可能会在生产代码的某处捕获并吞下 OutOfMemoryException
而没有意识到,从而导致您看到的问题。
作为替代方案,我会建议一个流式解决方案,您可以通过使 Document
class 从 XmlSerializer
中增量复制每个文件的内容到 XML 输出实施 IXmlSerializable
:
public class Document : IXmlSerializable
{
public string DocumentPath { get; set; }
public string DocumentTitle
{
get
{
if (DocumentPath == null)
return null;
return Path.GetFileName(DocumentPath);
}
}
const string DocumentTitleName = "DocumentTitle";
const string BinaryDocumentName = "BinaryDocument";
#region IXmlSerializable Members
System.Xml.Schema.XmlSchema IXmlSerializable.GetSchema()
{
return null;
}
void ReadXmlElement(XmlReader reader)
{
if (reader.Name == DocumentTitleName)
DocumentPath = reader.ReadElementContentAsString();
}
void IXmlSerializable.ReadXml(XmlReader reader)
{
reader.ReadXml(null, ReadXmlElement);
}
void IXmlSerializable.WriteXml(XmlWriter writer)
{
writer.WriteElementString(DocumentTitleName, DocumentTitle ?? "");
if (DocumentPath != null)
{
try
{
using (var stream = File.OpenRead(DocumentPath))
{
// Write the start element if the file was successfully opened
writer.WriteStartElement(BinaryDocumentName);
try
{
var buffer = new byte[6 * 1024];
int read;
while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
writer.WriteBase64(buffer, 0, read);
}
finally
{
// Write the end element even if an error occurred while streaming the file.
writer.WriteEndElement();
}
}
}
catch (Exception ex)
{
// You could log the exception as an element or as a comment, as you prefer.
// Log as a comment
writer.WriteComment("Caught exception with message: " + ex.Message);
writer.WriteComment("Exception details:");
writer.WriteComment(ex.ToString());
// Log as an element.
writer.WriteElementString("ExceptionMessage", ex.Message);
writer.WriteElementString("ExceptionDetails", ex.ToString());
}
}
}
#endregion
}
// test class
public class DocumentContainer
{
public List<Document> DocumentCollection { get; set; }
}
public static class XmlSerializationExtensions
{
public static void ReadXml(this XmlReader reader, Action<IList<XAttribute>> readXmlAttributes, Action<XmlReader> readXmlElement)
{
if (reader.NodeType != XmlNodeType.Element)
throw new InvalidOperationException("reader.NodeType != XmlNodeType.Element");
if (readXmlAttributes != null)
{
var attributes = new List<XAttribute>(reader.AttributeCount);
while (reader.MoveToNextAttribute())
{
attributes.Add(new XAttribute(XName.Get(reader.Name, reader.NamespaceURI), reader.Value));
}
// Move the reader back to the element node.
reader.MoveToElement();
readXmlAttributes(attributes);
}
if (reader.IsEmptyElement)
{
reader.Read();
return;
}
reader.ReadStartElement(); // Advance to the first sub element of the wrapper element.
while (reader.NodeType != XmlNodeType.EndElement)
{
if (reader.NodeType != XmlNodeType.Element)
// Comment, whitespace
reader.Read();
else
{
using (var subReader = reader.ReadSubtree())
{
while (subReader.NodeType != XmlNodeType.Element) // Read past XmlNodeType.None
if (!subReader.Read())
break;
if (readXmlElement != null)
readXmlElement(subReader);
}
reader.Read();
}
}
// Move past the end of the wrapper element
reader.ReadEndElement();
}
}
然后使用如下:
public static void SerializeFilesToXml(string directoryPath, string xmlPath)
{
var docs = from file in Directory.GetFiles(directoryPath)
select new Document { DocumentPath = file };
var container = new DocumentContainer { DocumentCollection = docs.ToList() };
using (var stream = File.Open(xmlPath, FileMode.Create, FileAccess.ReadWrite))
using (var writer = XmlWriter.Create(stream, new XmlWriterSettings { Indent = true, IndentChars = " " }))
{
new XmlSerializer(container.GetType()).Serialize(writer, container);
}
Debug.WriteLine("Wrote " + xmlPath);
}
使用流解决方案,当序列化 4 个文件,每个文件大约 250 MB 时,我的内存使用增加了 0.8 MB。使用原来的 classes,我的内存增加了 1022 MB。
更新
如果您需要将 XML 写入内存流,请注意 c# MemoryStream
has a hard maximum stream length of int.MaxValue
(i.e. 2 GB) because the underlying memory is simply a byte array. On a 32-bit process the effective max length will be much smaller, see OutOfMemoryException while populating MemoryStream: 256MB allocation on 16GB system.
要以编程方式检查您的进程是否真的是 32 位,请参阅 How to determine programmatically whether a particular process is 32-bit or 64-bit. To change to 64 bit, see What is the purpose of the “Prefer 32-bit” setting in Visual Studio 2012 and how does it actually work?。
如果您确定自己 运行 处于 64 位模式并且仍然超出 MemoryStream
的硬大小限制,也许请参阅 alternative to MemoryStream for large data volumes or MemoryStream replacement?。