将 char[] 缓冲区传递给 XmlSerializer

Pass a char[] buffer to a XmlSerializer

我有一个 XML 存储在一个 char 数组中 - char[] - 我在一个 int 变量中有数据的内容长度。我需要使用 XmlSerializer 反序列化数据。

出于性能原因,我需要避免分配字符串对象,因为数据通常 >85kb 并且会生成 Gen2 对象。

有没有办法将 char[] 传递给 XmlSerializer 而无需将其转换为字符串?它接受 StreamTextReader 但我找不到从 char[].

构造一个的方法

我正在想象这样的事情(除了 C# 没有 CharArrayStream 或 CharArrayReader):

public MyEntity DeserializeXmlDocument(char [] buffer, int contentLength) {
    using (var stream = new CharArrayStream(buffer, contentLength))
    {
        return _xmlSerializer.Deserialize(stream) as MyEntity;
    }
}

正如更多信息一样,我们正处于分析现有代码并确定痛点的时刻,因此这不是 "premature optimization" 或 "XY problem" 的情况。

我将@György Kőszeg 链接的代码修改为 class CharArrayStream。到目前为止,这在我的测试中有效:

public class CharArrayStream : Stream
{
    private readonly char[] str;
    private readonly int n;

    public override bool CanRead => true;
    public override bool CanSeek => true;
    public override bool CanWrite => false;
    public override long Length => n;
    public override long Position { get; set; } // TODO: bounds check

    public CharArrayStream(char[] str, int n)
    {
        this.str = str;
        this.n = n;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        switch (origin)
        {
            case SeekOrigin.Begin:
                Position = offset;
                break;
            case SeekOrigin.Current:
                Position += offset;
                break;
            case SeekOrigin.End:
                Position = Length - offset;
                break;
        }

        return Position;
    }

    private byte this[int i] => (byte)str[i];

    public override int Read(byte[] buffer, int offset, int count)
    {
        // TODO: bounds check
        var len = Math.Min(count, Length - Position);
        for (int i = 0; i < len; i++)
        {
            buffer[offset++] = this[(int)(Position++)];
        }
        return (int)len;
    }

    public override int ReadByte() => Position >= Length ? -1 : this[(int)Position++];
    public override void Flush() { }
    public override void SetLength(long value) => throw new NotSupportedException();
    public override void Write(byte[] buffer, int offset, int count) => throw new NotSupportedException();
    public override string ToString() => throw new NotSupportedException();
}

我可以这样使用:

public MyEntity DeserializeXmlDocument(char [] buffer, int contentLength) {
    using (var stream = new CharArrayStream(buffer, contentLength))
    {
        return _xmlSerializer.Deserialize(stream) as MyEntity;
    }
}

谢谢@György Kőszeg!

TextReader 子类化以从字符数组或等效数组中读取数据相当简单。这是一个采用 ReadOnlyMemory<char> 的版本,它可以表示 stringchar [] 字符数组的一部分:

public sealed class CharMemoryReader : TextReader
{
    private ReadOnlyMemory<char> chars;
    private int position;

    public CharMemoryReader(ReadOnlyMemory<char> chars)
    {
        this.chars = chars;
        this.position = 0;
    }

    void CheckClosed()
    {
        if (position < 0)
            throw new ObjectDisposedException(null, string.Format("{0} is closed.", ToString()));
    }

    public override void Close() => Dispose(true);

    protected override void Dispose(bool disposing)
    {
        chars = ReadOnlyMemory<char>.Empty;
        position = -1;
        base.Dispose(disposing);
    }

    public override int Peek()
    {
        CheckClosed();
        return position >= chars.Length ? -1 : chars.Span[position];
    }

    public override int Read()
    {
        CheckClosed();
        return position >= chars.Length ? -1 : chars.Span[position++];
    }

    public override int Read(char[] buffer, int index, int count)
    {
        CheckClosed();
        if (buffer == null)
            throw new ArgumentNullException(nameof(buffer));
        if (index < 0)
            throw new ArgumentOutOfRangeException(nameof(index));
        if (count < 0)
            throw new ArgumentOutOfRangeException(nameof(count));
        if (buffer.Length - index < count)
            throw new ArgumentException("buffer.Length - index < count");

        return Read(buffer.AsSpan().Slice(index, count));
    }

    public override int Read(Span<char> buffer)
    {
        CheckClosed();

        var nRead = chars.Length - position;
        if (nRead > 0)
        {
            if (nRead > buffer.Length)
                nRead = buffer.Length;
            chars.Span.Slice(position, nRead).CopyTo(buffer);
            position += nRead;
        }
        return nRead;
    }

    public override string ReadToEnd()
    {
        CheckClosed();
        var s = position == 0 ? chars.ToString() : chars.Slice(position, chars.Length - position).ToString();
        position = chars.Length;
        return s;
    }

    public override string ReadLine()
    {
        CheckClosed();
        var span = chars.Span;
        var i = position;
        for( ; i < span.Length; i++)
        {
            var ch = span[i];
            if (ch == '\r' || ch == '\n')
            {
                var result = span.Slice(position, i - position).ToString();
                position = i + 1;
                if (ch == '\r' && position < span.Length && span[position] == '\n')
                    position++;
                return result;
            }
        }
        if (i > position)
        {
            var result = span.Slice(position, i - position).ToString();
            position = i;
            return result;
        }
        return null;
    }

    public override int ReadBlock(char[] buffer, int index, int count) => Read(buffer, index, count);
    public override int ReadBlock(Span<char> buffer) => Read(buffer);

    public override Task<String> ReadLineAsync() => Task.FromResult(ReadLine());
    public override Task<String> ReadToEndAsync() => Task.FromResult(ReadToEnd());
    public override Task<int> ReadBlockAsync(char[] buffer, int index, int count) => Task.FromResult(ReadBlock(buffer, index, count));
    public override Task<int> ReadAsync(char[] buffer, int index, int count) => Task.FromResult(Read(buffer, index, count));
    public override ValueTask<int> ReadBlockAsync(Memory<char> buffer, CancellationToken cancellationToken = default) =>
        cancellationToken.IsCancellationRequested ? new ValueTask<int>(Task.FromCanceled<int>(cancellationToken)) : new ValueTask<int>(ReadBlock(buffer.Span));
    public override ValueTask<int> ReadAsync(Memory<char> buffer, CancellationToken cancellationToken = default) =>
        cancellationToken.IsCancellationRequested ? new ValueTask<int>(Task.FromCanceled<int>(cancellationToken)) : new ValueTask<int>(Read(buffer.Span)); 
}

然后将其与以下扩展方法之一一起使用:

public static partial class XmlSerializationHelper
{
    public static T LoadFromXml<T>(this char [] xml, int contentLength, XmlSerializer serial = null) => 
        new ReadOnlyMemory<char>(xml, 0, contentLength).LoadFromXml<T>(serial);

    public static T LoadFromXml<T>(this ReadOnlyMemory<char> xml, XmlSerializer serial = null)
    {
        serial = serial ?? new XmlSerializer(typeof(T));
        using (var reader = new CharMemoryReader(xml))
            return (T)serial.Deserialize(reader);
    }
}

例如

var result = buffer.LoadFromXml<MyEntity>(contentLength, _xmlSerializer);

备注:

  • 一个char []字符数组与没有BOM, so one could create a custom Stream implementation resembling MemoryStream that represents each char as two bytes, as is done in this answer to How do I generate a stream from a string? by György Kőszeg的UTF-16编码内存流的内容基本相同。然而,完全正确地执行此操作看起来有点棘手,因为正确设置所有 async 方法似乎很重要。

    完成后 XmlReader 仍需要使用 StreamReader 将自定义流包装成 "decodes" 字符序列,正确推断过程中的编码(我观察到这有时可能会被错误地完成,例如当编码声明 XML 声明与实际编码不匹配时)。

    我选择创建自定义 TextReader 而不是自定义 Stream 以避免不必要的解码步骤,并且因为 async 实施似乎不那么麻烦。

  • 通过截断(例如 (byte)str[i])将每个 char 表示为单个字节将损坏包含任何多字节字符的 XML。

  • 我没有对上面的实现做任何性能调优。

演示 fiddle here.