JsonDocument 不完整的解析与更大的有效载荷

JsonDocument incomplete parsing with larger payloads

所以基本上,我有一个 HttpClient 试图从端点获取任何形式的 JSON 数据。我以前使用 Newtonsoft.Json 轻松实现此目的,但在将所有功能迁移到 STJ 后,我开始注意到解析不正确。

Platforms tested: macOS & Linux (Google Kubernetes Engine)

Framework: .NET Core 3.1 LTS

下面的代码截图显示了一个 API 和 returns 一个 JSON 数组。我只是流式传输它,将它加载到 JsonDocument 中,然后尝试查看它。没有像预期的那样出来。下面的代码与步骤调试 var 结果一起提供。

using System;
using System.ComponentModel;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Web;
using System.Xml;

namespace HttpCallDemo
{
    class Program
    {
        static async Task Main(string[] args)
        {
            using (var httpClient = new HttpClient())
            {
                // FLUSH
                httpClient.DefaultRequestHeaders.Clear();
                httpClient.MaxResponseContentBufferSize = 4096;
                string body = string.Empty, customMediaType = string.Empty; // For POST/PUT

                // Setup the url
                var uri = new UriBuilder("https://api-pub.bitfinex.com/v2/tickers?symbols=ALL");
                uri.Port = -1;

                // Pull in the payload
                var requestPayload = new HttpRequestMessage(HttpMethod.Get, uri.ToString());
                HttpResponseMessage responsePayload;

                responsePayload = await httpClient.SendAsync(requestPayload,
                    HttpCompletionOption.ResponseHeadersRead);

                var byteArr = await responsePayload.Content.ReadAsByteArrayAsync();
                if (byteArr.LongCount() > 4194304) // 4MB
                    return; // Too big.

                // Pull the content
                var contentFromBytes = Encoding.Default.GetString(byteArr);
                JsonDocument payload;

                switch (responsePayload.StatusCode)
                {
                    case HttpStatusCode.OK:
                        // Return the payload distinctively
                        payload = JsonDocument.Parse(contentFromBytes);

#if DEBUG
                        var testJsonRes = Encoding.UTF8.GetString(
                            Utf8Json.JsonSerializer.Serialize(payload.RootElement));
                        // var testRawRes = contentStream.read
                        var testJsonResEl = payload.RootElement.GetRawText();
#endif
                        break;
                    default:
                        throw new InvalidDataException("Invalid HTTP response.");
                }
            }
        }
    }
}

简单执行上面的Minimal代码,注意解析后的payload和原来的不一样了吗?我确定 STJ 的选项有问题。似乎我们必须优化或明确定义其限制以允许它处理 JSON 有效负载。

深入了解调试内容会使事情变得更加奇怪。当 HttpClient 获取有效负载,将其读取为字符串时,它按原样给我整个 JSON 字符串。但是,一旦我们尝试将其解析为 JsonDocument 并进一步调用 RootElement.Clone(),我们最终会得到一个 JsonElement,它的数据要少得多,同时携带一个无效的 JSON 结构(如下所示)。

ValueKind = Array : "[["tBTCUSD",11418,70.31212518,11419,161.93475693,258.02141213,0.0231,11418,2980.0289306,11438,11003],["tLTCUSD",58.919,2236.00823543,58.95,2884.6718013699997,1.258,0.0218,58.998,63147.48344762,59.261,56.334],["tLTCBTC",0.0051609,962.80334198,0.005166,1170.07399991,-0.000012,-0.0023,0.0051609,4178.13148459,0.0051852,0.0051],["tETHUSD",396.54,336.52151165,396.55,384.37623341,8.26964946,0.0213,396.50930256,69499.5382821,397.77,380.5],["tETHBTC",0.034731,166.67781664000003,0.034751,356.03450125999996,-0.000054,-0.0016,0.034747,5855.04978836,0.035109,0.0343],["tETCBTC",0.00063087,15536.813429530002,0.00063197,16238.600279749999,-0.00000838,-0.0131,0.00063085,73137.62192801,0.00064135,0.00062819],["tETCUSD",7.2059,9527.40221867,7.2176,8805.54677899,0.0517,0.0072,7.2203,49618.78868196,7.2263,7],["tRRTUSD",0.057476,33577.52064154,0.058614,20946.501210000002,0.023114,0.6511,0.058614,210741.23592011,0.06443,0.0355],["tZECUSD",88.131,821.28048322,88.332,880.37484662,5.925,0.0

当然,尝试阅读其内容会导致:

System.InvalidOperationException: Operation is not valid due to the current state of the object.
   at System.Text.Json.JsonElement.get_Item(Int32 index)
   at Nozomi.Preprocessing.Abstracts.BaseProcessingService`1.ProcessIdentifier(JsonElement jsonDoc, String identifier) in /Users/nicholaschen/Projects/nozomi/Nozomi.Infra.Preprocessing/Abstracts/BaseProcessingService.cs:line 255

这里有证据表明从端点传入的数据有 38KB 的价值。

更新

进一步测试

                                    if (payload.RootElement.ValueKind.Equals(JsonValueKind.Array))
                                    {
                                        string testJsonArr;
                                        testJsonArr = Encoding.UTF8.GetString(
                                            Utf8Json.JsonSerializer.Serialize(
                                                payload.RootElement.EnumerateArray()));
                                    }

显示更大的数组数组(超过 9 个元素,每个元素有 11 个元素)会导致不完整的 JSON 结构,从而导致我面临的问题。

对于使用 JsonDocument 和 JsonElement 的用户,请注意步骤调试变量并不准确。不建议在运行时检查变量,因为它们不会完全显示自己。

@dbc has proven re-serializing 反序列化数据将产生完整的数据集。我强烈建议您将用于调试的序列化程序包装在 DEBUG 预处理器中,以确保这些冗余行不会在开发之外执行。

要与这些实体交互,请确保您尽可能使用 .clone() 以防止处置,并确保您正在访问 RootElement,然后在逐步调试模式下查看其值之前遍历它因为大的值不会显示。