JsonDocument 不完整的解析与更大的有效载荷
JsonDocument incomplete parsing with larger payloads
所以基本上,我有一个 HttpClient 试图从端点获取任何形式的 JSON 数据。我以前使用 Newtonsoft.Json 轻松实现此目的,但在将所有功能迁移到 STJ 后,我开始注意到解析不正确。
Platforms tested: macOS & Linux (Google Kubernetes Engine)
Framework: .NET Core 3.1 LTS
下面的代码截图显示了一个 API 和 returns 一个 JSON 数组。我只是流式传输它,将它加载到 JsonDocument 中,然后尝试查看它。没有像预期的那样出来。下面的代码与步骤调试 var 结果一起提供。
using System;
using System.ComponentModel;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Web;
using System.Xml;
namespace HttpCallDemo
{
class Program
{
static async Task Main(string[] args)
{
using (var httpClient = new HttpClient())
{
// FLUSH
httpClient.DefaultRequestHeaders.Clear();
httpClient.MaxResponseContentBufferSize = 4096;
string body = string.Empty, customMediaType = string.Empty; // For POST/PUT
// Setup the url
var uri = new UriBuilder("https://api-pub.bitfinex.com/v2/tickers?symbols=ALL");
uri.Port = -1;
// Pull in the payload
var requestPayload = new HttpRequestMessage(HttpMethod.Get, uri.ToString());
HttpResponseMessage responsePayload;
responsePayload = await httpClient.SendAsync(requestPayload,
HttpCompletionOption.ResponseHeadersRead);
var byteArr = await responsePayload.Content.ReadAsByteArrayAsync();
if (byteArr.LongCount() > 4194304) // 4MB
return; // Too big.
// Pull the content
var contentFromBytes = Encoding.Default.GetString(byteArr);
JsonDocument payload;
switch (responsePayload.StatusCode)
{
case HttpStatusCode.OK:
// Return the payload distinctively
payload = JsonDocument.Parse(contentFromBytes);
#if DEBUG
var testJsonRes = Encoding.UTF8.GetString(
Utf8Json.JsonSerializer.Serialize(payload.RootElement));
// var testRawRes = contentStream.read
var testJsonResEl = payload.RootElement.GetRawText();
#endif
break;
default:
throw new InvalidDataException("Invalid HTTP response.");
}
}
}
}
}
简单执行上面的Minimal代码,注意解析后的payload和原来的不一样了吗?我确定 STJ 的选项有问题。似乎我们必须优化或明确定义其限制以允许它处理 JSON 有效负载。
深入了解调试内容会使事情变得更加奇怪。当 HttpClient 获取有效负载,将其读取为字符串时,它按原样给我整个 JSON 字符串。但是,一旦我们尝试将其解析为 JsonDocument 并进一步调用 RootElement.Clone(),我们最终会得到一个 JsonElement,它的数据要少得多,同时携带一个无效的 JSON 结构(如下所示)。
ValueKind = Array : "[["tBTCUSD",11418,70.31212518,11419,161.93475693,258.02141213,0.0231,11418,2980.0289306,11438,11003],["tLTCUSD",58.919,2236.00823543,58.95,2884.6718013699997,1.258,0.0218,58.998,63147.48344762,59.261,56.334],["tLTCBTC",0.0051609,962.80334198,0.005166,1170.07399991,-0.000012,-0.0023,0.0051609,4178.13148459,0.0051852,0.0051],["tETHUSD",396.54,336.52151165,396.55,384.37623341,8.26964946,0.0213,396.50930256,69499.5382821,397.77,380.5],["tETHBTC",0.034731,166.67781664000003,0.034751,356.03450125999996,-0.000054,-0.0016,0.034747,5855.04978836,0.035109,0.0343],["tETCBTC",0.00063087,15536.813429530002,0.00063197,16238.600279749999,-0.00000838,-0.0131,0.00063085,73137.62192801,0.00064135,0.00062819],["tETCUSD",7.2059,9527.40221867,7.2176,8805.54677899,0.0517,0.0072,7.2203,49618.78868196,7.2263,7],["tRRTUSD",0.057476,33577.52064154,0.058614,20946.501210000002,0.023114,0.6511,0.058614,210741.23592011,0.06443,0.0355],["tZECUSD",88.131,821.28048322,88.332,880.37484662,5.925,0.0
当然,尝试阅读其内容会导致:
System.InvalidOperationException: Operation is not valid due to the current state of the object.
at System.Text.Json.JsonElement.get_Item(Int32 index)
at Nozomi.Preprocessing.Abstracts.BaseProcessingService`1.ProcessIdentifier(JsonElement jsonDoc, String identifier) in /Users/nicholaschen/Projects/nozomi/Nozomi.Infra.Preprocessing/Abstracts/BaseProcessingService.cs:line 255
这里有证据表明从端点传入的数据有 38KB 的价值。
更新
进一步测试
if (payload.RootElement.ValueKind.Equals(JsonValueKind.Array))
{
string testJsonArr;
testJsonArr = Encoding.UTF8.GetString(
Utf8Json.JsonSerializer.Serialize(
payload.RootElement.EnumerateArray()));
}
显示更大的数组数组(超过 9 个元素,每个元素有 11 个元素)会导致不完整的 JSON 结构,从而导致我面临的问题。
所以基本上,我有一个 HttpClient 试图从端点获取任何形式的 JSON 数据。我以前使用 Newtonsoft.Json 轻松实现此目的,但在将所有功能迁移到 STJ 后,我开始注意到解析不正确。
Platforms tested: macOS & Linux (Google Kubernetes Engine)
Framework: .NET Core 3.1 LTS
下面的代码截图显示了一个 API 和 returns 一个 JSON 数组。我只是流式传输它,将它加载到 JsonDocument 中,然后尝试查看它。没有像预期的那样出来。下面的代码与步骤调试 var 结果一起提供。
using System;
using System.ComponentModel;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Web;
using System.Xml;
namespace HttpCallDemo
{
class Program
{
static async Task Main(string[] args)
{
using (var httpClient = new HttpClient())
{
// FLUSH
httpClient.DefaultRequestHeaders.Clear();
httpClient.MaxResponseContentBufferSize = 4096;
string body = string.Empty, customMediaType = string.Empty; // For POST/PUT
// Setup the url
var uri = new UriBuilder("https://api-pub.bitfinex.com/v2/tickers?symbols=ALL");
uri.Port = -1;
// Pull in the payload
var requestPayload = new HttpRequestMessage(HttpMethod.Get, uri.ToString());
HttpResponseMessage responsePayload;
responsePayload = await httpClient.SendAsync(requestPayload,
HttpCompletionOption.ResponseHeadersRead);
var byteArr = await responsePayload.Content.ReadAsByteArrayAsync();
if (byteArr.LongCount() > 4194304) // 4MB
return; // Too big.
// Pull the content
var contentFromBytes = Encoding.Default.GetString(byteArr);
JsonDocument payload;
switch (responsePayload.StatusCode)
{
case HttpStatusCode.OK:
// Return the payload distinctively
payload = JsonDocument.Parse(contentFromBytes);
#if DEBUG
var testJsonRes = Encoding.UTF8.GetString(
Utf8Json.JsonSerializer.Serialize(payload.RootElement));
// var testRawRes = contentStream.read
var testJsonResEl = payload.RootElement.GetRawText();
#endif
break;
default:
throw new InvalidDataException("Invalid HTTP response.");
}
}
}
}
}
简单执行上面的Minimal代码,注意解析后的payload和原来的不一样了吗?我确定 STJ 的选项有问题。似乎我们必须优化或明确定义其限制以允许它处理 JSON 有效负载。
深入了解调试内容会使事情变得更加奇怪。当 HttpClient 获取有效负载,将其读取为字符串时,它按原样给我整个 JSON 字符串。但是,一旦我们尝试将其解析为 JsonDocument 并进一步调用 RootElement.Clone(),我们最终会得到一个 JsonElement,它的数据要少得多,同时携带一个无效的 JSON 结构(如下所示)。
ValueKind = Array : "[["tBTCUSD",11418,70.31212518,11419,161.93475693,258.02141213,0.0231,11418,2980.0289306,11438,11003],["tLTCUSD",58.919,2236.00823543,58.95,2884.6718013699997,1.258,0.0218,58.998,63147.48344762,59.261,56.334],["tLTCBTC",0.0051609,962.80334198,0.005166,1170.07399991,-0.000012,-0.0023,0.0051609,4178.13148459,0.0051852,0.0051],["tETHUSD",396.54,336.52151165,396.55,384.37623341,8.26964946,0.0213,396.50930256,69499.5382821,397.77,380.5],["tETHBTC",0.034731,166.67781664000003,0.034751,356.03450125999996,-0.000054,-0.0016,0.034747,5855.04978836,0.035109,0.0343],["tETCBTC",0.00063087,15536.813429530002,0.00063197,16238.600279749999,-0.00000838,-0.0131,0.00063085,73137.62192801,0.00064135,0.00062819],["tETCUSD",7.2059,9527.40221867,7.2176,8805.54677899,0.0517,0.0072,7.2203,49618.78868196,7.2263,7],["tRRTUSD",0.057476,33577.52064154,0.058614,20946.501210000002,0.023114,0.6511,0.058614,210741.23592011,0.06443,0.0355],["tZECUSD",88.131,821.28048322,88.332,880.37484662,5.925,0.0
当然,尝试阅读其内容会导致:
System.InvalidOperationException: Operation is not valid due to the current state of the object.
at System.Text.Json.JsonElement.get_Item(Int32 index)
at Nozomi.Preprocessing.Abstracts.BaseProcessingService`1.ProcessIdentifier(JsonElement jsonDoc, String identifier) in /Users/nicholaschen/Projects/nozomi/Nozomi.Infra.Preprocessing/Abstracts/BaseProcessingService.cs:line 255
这里有证据表明从端点传入的数据有 38KB 的价值。
更新
进一步测试
if (payload.RootElement.ValueKind.Equals(JsonValueKind.Array))
{
string testJsonArr;
testJsonArr = Encoding.UTF8.GetString(
Utf8Json.JsonSerializer.Serialize(
payload.RootElement.EnumerateArray()));
}
显示更大的数组数组(超过 9 个元素,每个元素有 11 个元素)会导致不完整的 JSON 结构,从而导致我面临的问题。