使用 JSON.NET 解析 1GB json 文件时出现问题
Issues parsing a 1GB json file using JSON.NET
我得到了一个应用程序,其中输入已从 5 万条位置记录扩展到 110 万条位置记录。
这导致了严重的问题,因为整个文件之前被反序列化为单个对象。
对于具有 110 万条记录的类似产品的文件,对象的大小约为 1GB。
由于大型对象 GC 问题,我希望将反序列化的对象保持在 85K 标记以下。
我试图一次解析出一个位置对象并将其反序列化,这样我就可以控制对象的数量
被反序列化并反过来控制对象的大小。我正在使用 Json.Net 库来执行此操作。
下面是我作为流接收到应用程序中的 JSON 文件的示例。
{
"Locations": [{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
},
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}]
}
我需要能够提取出各个 Location 对象,以便查看以下内容
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}
我正在尝试使用 Json.NET JsonTextReader 来完成此操作,但是我无法让 reader 在其缓冲区中包含整个位置,因为stream reader 最初会下降到 "RadioProtocols",这是通过对象的中途,当流到达对象的末尾时,reader 已经丢弃了开始对象的。
我用来尝试使此功能正常工作的代码是
var ser = new JsonSerializer();
using (var reader = new JsonTextReader(new StreamReader(stream)))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
do
{
reader.Read();
} while (reader.TokenType != JsonToken.EndObject && reader.Depth == 2);
var singleLocation = ser.Deserialize<Locations>(reader);
}
}
}
如能提供有关此信息或替代方法的任何信息,我们将不胜感激。附带说明一下,我们的客户发送信息的方式目前无法更改。
当 reader 位于要反序列化的对象的开头(在您的情况下是 Locations
数组中的一个条目),您可以调用 ser.Deserialize<T>(reader)
和它会起作用,在该级别前进到对象的末尾,然后再前进。因此,以下内容应遍历文件中的 Location
个对象,分别加载每个对象:
public static IEnumerable<T> DeserializeNestedItems<T>(TextReader textReader)
{
var ser = new JsonSerializer();
using (var reader = new JsonTextReader(textReader))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
var item = ser.Deserialize<T>(reader);
yield return item;
}
}
}
}
以及使用您的测试字符串的示例:
Debug.Assert(DeserializeNestedItems<Location>(new StringReader(json)).Count() == 2); // No assert.
var list = DeserializeNestedItems<Location>(new StringReader(json)).SelectMany(l => l.AccessPoints).Select(a => new { a.Latitude, a.Longitude }).ToList();
Debug.WriteLine(JsonConvert.SerializeObject(list, Formatting.Indented));
输出:
[
{
"Latitude": 40.59485,
"Longitude": -73.96174
},
{
"Latitude": 40.59485,
"Longitude": -73.96174
}
]
注意 - Location
class 来自将您的 JSON 发布到 http://json2csharp.com/。
感谢您提供的所有帮助,我已经设法让它做我想做的事情,即反序列化各个位置对象。
如果将项目转换为 JObject,它将读取完整对象并对其进行反序列化,这可以循环以获得解决方案。
这是确定的代码
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
location = JObject.Load(reader).ToObject<Location>();
var lv = new LocationValidator(location, FootprintInfo.OperatorId, FootprintInfo.RoamingGroups, true);
var vr = lv.IsValid();
if (vr.Successful)
{
yield return location;
}
else
{
errors.Add(new Error(elNumber, location.LocationId, vr.Error.Field, vr.Error.Detail));
if (errors.Count >= maxErrors)
{
yield break;
}
}
++elNumber;
}
}
我得到了一个应用程序,其中输入已从 5 万条位置记录扩展到 110 万条位置记录。 这导致了严重的问题,因为整个文件之前被反序列化为单个对象。 对于具有 110 万条记录的类似产品的文件,对象的大小约为 1GB。 由于大型对象 GC 问题,我希望将反序列化的对象保持在 85K 标记以下。
我试图一次解析出一个位置对象并将其反序列化,这样我就可以控制对象的数量 被反序列化并反过来控制对象的大小。我正在使用 Json.Net 库来执行此操作。
下面是我作为流接收到应用程序中的 JSON 文件的示例。
{
"Locations": [{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
},
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}]
}
我需要能够提取出各个 Location 对象,以便查看以下内容
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}
我正在尝试使用 Json.NET JsonTextReader 来完成此操作,但是我无法让 reader 在其缓冲区中包含整个位置,因为stream reader 最初会下降到 "RadioProtocols",这是通过对象的中途,当流到达对象的末尾时,reader 已经丢弃了开始对象的。
我用来尝试使此功能正常工作的代码是
var ser = new JsonSerializer();
using (var reader = new JsonTextReader(new StreamReader(stream)))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
do
{
reader.Read();
} while (reader.TokenType != JsonToken.EndObject && reader.Depth == 2);
var singleLocation = ser.Deserialize<Locations>(reader);
}
}
}
如能提供有关此信息或替代方法的任何信息,我们将不胜感激。附带说明一下,我们的客户发送信息的方式目前无法更改。
当 reader 位于要反序列化的对象的开头(在您的情况下是 Locations
数组中的一个条目),您可以调用 ser.Deserialize<T>(reader)
和它会起作用,在该级别前进到对象的末尾,然后再前进。因此,以下内容应遍历文件中的 Location
个对象,分别加载每个对象:
public static IEnumerable<T> DeserializeNestedItems<T>(TextReader textReader)
{
var ser = new JsonSerializer();
using (var reader = new JsonTextReader(textReader))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
var item = ser.Deserialize<T>(reader);
yield return item;
}
}
}
}
以及使用您的测试字符串的示例:
Debug.Assert(DeserializeNestedItems<Location>(new StringReader(json)).Count() == 2); // No assert.
var list = DeserializeNestedItems<Location>(new StringReader(json)).SelectMany(l => l.AccessPoints).Select(a => new { a.Latitude, a.Longitude }).ToList();
Debug.WriteLine(JsonConvert.SerializeObject(list, Formatting.Indented));
输出:
[ { "Latitude": 40.59485, "Longitude": -73.96174 }, { "Latitude": 40.59485, "Longitude": -73.96174 } ]
注意 - Location
class 来自将您的 JSON 发布到 http://json2csharp.com/。
感谢您提供的所有帮助,我已经设法让它做我想做的事情,即反序列化各个位置对象。
如果将项目转换为 JObject,它将读取完整对象并对其进行反序列化,这可以循环以获得解决方案。
这是确定的代码
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
location = JObject.Load(reader).ToObject<Location>();
var lv = new LocationValidator(location, FootprintInfo.OperatorId, FootprintInfo.RoamingGroups, true);
var vr = lv.IsValid();
if (vr.Successful)
{
yield return location;
}
else
{
errors.Add(new Error(elNumber, location.LocationId, vr.Error.Field, vr.Error.Detail));
if (errors.Count >= maxErrors)
{
yield break;
}
}
++elNumber;
}
}