修复无效 JSON 的最有效方法
Most efficient way to fix an invalid JSON
我陷入了一个不可能的境地。我有一个来自外部 space 的 JSON(他们无法更改它)。这是 JSON
{
user:'180111',
title:'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n\'',
date:'2007/01/10 19:48:38',
"id":"3322121",
"previd":112211,
"body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \*/ :\/",
"from":"112221",
"username":"mikethunder",
"creationdate":"2007\/01\/10 14:04:49"
}
"It is nowhere near a valid JSON",我说。他们的回应是 "emmm! but Javascript can read it without complain":
<html>
<script type="text/javascript">
var obj = {"PUT JSON FROM UP THERE HERE"};
document.write(obj.title);
document.write("<br />");
document.write(obj.creationdate + " " + obj.date);
document.write("<br />");
document.write(obj.body);
document.write("<br />");
</script>
<body>
</body>
</html>
问题
我应该通过 .NET(4) 读取和解析这个字符串,它破坏了 Json.org 的 C# 部分中提到的 14 个库中的 3 个(没有尝试其余的)。为了解决问题,我编写了以下函数来解决单引号和双引号的问题。
public static string JSONBeautify(string InStr){
bool inSingleQuote = false;
bool inDoubleQuote = false;
bool escaped = false;
StringBuilder sb = new StringBuilder(InStr);
sb = sb.Replace("`", "<°)))><"); // replace all instances of "grave accent" to "fish" so we can use that mark later.
// Hopefully there is no "fish" in our JSON
for (int i = 0; i < sb.Length; i++) {
switch (sb[i]) {
case '\':
if (!escaped)
escaped = true;
else
escaped = false;
break;
case '\'':
if (!inSingleQuote && !inDoubleQuote) {
sb[i] = '"'; // Change opening single quote string markers to double qoute
inSingleQuote = true;
} else if (inSingleQuote && !escaped) {
sb[i] = '"'; // Change closing single quote string markers to double qoute
inSingleQuote = false;
} else if (escaped) {
escaped = false;
}
break;
case '"':
if (!inSingleQuote && !inDoubleQuote) {
inDoubleQuote = true; // This is a opening double quote string marker
} else if (inSingleQuote && !escaped) {
sb[i] = '`'; // Change unescaped double qoute to grave accent
} else if (inDoubleQuote && !escaped) {
inDoubleQuote = false; // This is a closing double quote string marker
} else if (escaped) {
escaped = false;
}
break;
default:
escaped = false;
break;
}
}
return sb.ToString()
.Replace("\/", "/") // Remove all instances of escaped / (\/) .hopefully no smileys in string
.Replace("`", "\\"") // Change all "grave accent"s to escaped double quote \"
.Replace("<°)))><", "`") // change all fishes back to "grave accent"
.Replace("\'","'"); // change all escaped single quotes to just single quote
}
现在 JSONlint 只会抱怨属性名称,我可以同时使用 JSON.NET 和 SimpleJSON 库来解析 JSON.
问题
我确信我的代码不是上述修复的最佳方法 JSON。
是否存在我的代码可能会中断的情况?有更好的方法吗?
您需要 运行 通过 JavaScript。在 .net 中启动 JavaScript 解析器。将字符串作为输入给JavaScript,使用JavaScript的原生JSON.stringify
转换:
obj = {
"user":'180111',
"title":'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n',
"date":'2007/01/10 19:48:38',
"id":"3322121",
"previd":"112211",
"body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \*/ :\/",
"from":"112221",
"username":"mikethunder",
"creationdate":"2007\/01\/10 14:04:49"
}
console.log(JSON.stringify(obj));
document.write(JSON.stringify(obj));
请记住,您获得的字符串(或对象)无效 JSON 并且无法使用 JSON 库进行解析。它需要首先转换为有效的JSON。但是它是有效的 JavaScript.
要完成此答案:您可以在 .Net 中使用 JavaScriptSerializer
。对于此解决方案,您需要以下程序集:
- System.Net
System.Web.Script.Serialization
var webClient = new WebClient();
string readHtml = webClient.DownloadString("uri to your source (extraterrestrial)");
var a = new JavaScriptSerializer();
Dictionary<string, object> results = a.Deserialize<Dictionary<string, object>>(readHtml);
这个怎么样:
string AlienJSON = "your alien JSON";
JavaScriptSerializer js = new JavaScriptSerializer();
string ProperJSON = js.Serialize(js.DeserializeObject(AlienJSON));
或者只是在反序列化后使用对象,而不是将其转换回字符串并将其传递给 JSON 解析器,这会让人头疼
正如 Mouser 还提到的,您需要在项目中使用 System.Web.Script.Serialization which is available by including system.web.extensions.dll,为此您需要将项目属性中的 Target 框架更改为 .NET Framework 4
。
编辑
消费反序列化对象的技巧是使用dynamic
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic obj = js.DeserializeObject(AlienJSON);
for JSON 在你的问题中只需使用
string body = obj["body"];
或者如果您的 JSON 是数组
if (obj is Array) {
foreach(dynamic o in obj){
string body = obj[0]["body"];
// ... do something with it
}
}
这是我制作的一个功能,可以修复损坏的 json:
function fixJSON(json){
function bulkRegex(str, callback){
if(callback && typeof callback === 'function'){
return callback(str);
}else if(callback && Array.isArray(callback)){
for(let i = 0; i < callback.length; i++){
if(callback[i] && typeof callback[i] === 'function'){
str = callback[i](str);
}else{break;}
}
return str;
}
return str;
}
if(json && json !== ''){
if(typeof json !== 'string'){
try{
json = JSON.stringify(json);
}catch(e){return false;}
}
if(typeof json === 'string'){
json = bulkRegex(json, false, [
str => str.replace(/[\n\t]/gm, ''),
str => str.replace(/,\}/gm, '}'),
str => str.replace(/,\]/gm, ']'),
str => {
str = str.split(/(?=[,\}\]])/g);
str = str.map(s => {
if(s.includes(':') && s){
let strP = s.split(/:(.+)/, 2);
strP[0] = strP[0].trim();
if(strP[0]){
let firstP = strP[0].split(/([,\{\[])/g);
firstP[firstP.length-1] = bulkRegex(firstP[firstP.length-1], false, p => p.replace(/[^A-Za-z0-9\-_]/, ''));
strP[0] = firstP.join('');
}
let part = strP[1].trim();
if((part.startsWith('"') && part.endsWith('"')) || (part.startsWith('\'') && part.endsWith('\'')) || (part.startsWith('`') && part.endsWith('`'))){
part = part.substr(1, part.length - 2);
}
part = bulkRegex(part, false, [
p => p.replace(/(["])/gm, '\'),
p => p.replace(/\'/gm, '\''),
p => p.replace(/\`/gm, '`'),
]);
strP[1] = ('"'+part+'"').trim();
s = strP.join(':');
}
return s;
});
return str.join('');
},
str => str.replace(/(['"])?([a-zA-Z0-9\-_]+)(['"])?:/g, '"":'),
str => {
str = str.split(/(?=[,\}\]])/g);
str = str.map(s => {
if(s.includes(':') && s){
let strP = s.split(/:(.+)/, 2);
strP[0] = strP[0].trim();
if(strP[1].includes('"') && strP[1].includes(':')){
let part = strP[1].trim();
if(part.startsWith('"') && part.endsWith('"')){
part = part.substr(1, part.length - 2);
part = bulkRegex(part, false, p => p.replace(/(?<!\)"/gm, ''));
}
strP[1] = ('"'+part+'"').trim();
}
s = strP.join(':');
}
return s;
});
return str.join('');
},
]);
try{
json = JSON.parse(json);
}catch(e){return false;}
}
return json;
}
return false;
}
我陷入了一个不可能的境地。我有一个来自外部 space 的 JSON(他们无法更改它)。这是 JSON
{
user:'180111',
title:'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n\'',
date:'2007/01/10 19:48:38',
"id":"3322121",
"previd":112211,
"body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \*/ :\/",
"from":"112221",
"username":"mikethunder",
"creationdate":"2007\/01\/10 14:04:49"
}
"It is nowhere near a valid JSON",我说。他们的回应是 "emmm! but Javascript can read it without complain":
<html>
<script type="text/javascript">
var obj = {"PUT JSON FROM UP THERE HERE"};
document.write(obj.title);
document.write("<br />");
document.write(obj.creationdate + " " + obj.date);
document.write("<br />");
document.write(obj.body);
document.write("<br />");
</script>
<body>
</body>
</html>
问题
我应该通过 .NET(4) 读取和解析这个字符串,它破坏了 Json.org 的 C# 部分中提到的 14 个库中的 3 个(没有尝试其余的)。为了解决问题,我编写了以下函数来解决单引号和双引号的问题。
public static string JSONBeautify(string InStr){
bool inSingleQuote = false;
bool inDoubleQuote = false;
bool escaped = false;
StringBuilder sb = new StringBuilder(InStr);
sb = sb.Replace("`", "<°)))><"); // replace all instances of "grave accent" to "fish" so we can use that mark later.
// Hopefully there is no "fish" in our JSON
for (int i = 0; i < sb.Length; i++) {
switch (sb[i]) {
case '\':
if (!escaped)
escaped = true;
else
escaped = false;
break;
case '\'':
if (!inSingleQuote && !inDoubleQuote) {
sb[i] = '"'; // Change opening single quote string markers to double qoute
inSingleQuote = true;
} else if (inSingleQuote && !escaped) {
sb[i] = '"'; // Change closing single quote string markers to double qoute
inSingleQuote = false;
} else if (escaped) {
escaped = false;
}
break;
case '"':
if (!inSingleQuote && !inDoubleQuote) {
inDoubleQuote = true; // This is a opening double quote string marker
} else if (inSingleQuote && !escaped) {
sb[i] = '`'; // Change unescaped double qoute to grave accent
} else if (inDoubleQuote && !escaped) {
inDoubleQuote = false; // This is a closing double quote string marker
} else if (escaped) {
escaped = false;
}
break;
default:
escaped = false;
break;
}
}
return sb.ToString()
.Replace("\/", "/") // Remove all instances of escaped / (\/) .hopefully no smileys in string
.Replace("`", "\\"") // Change all "grave accent"s to escaped double quote \"
.Replace("<°)))><", "`") // change all fishes back to "grave accent"
.Replace("\'","'"); // change all escaped single quotes to just single quote
}
现在 JSONlint 只会抱怨属性名称,我可以同时使用 JSON.NET 和 SimpleJSON 库来解析 JSON.
问题
我确信我的代码不是上述修复的最佳方法 JSON。 是否存在我的代码可能会中断的情况?有更好的方法吗?
您需要 运行 通过 JavaScript。在 .net 中启动 JavaScript 解析器。将字符串作为输入给JavaScript,使用JavaScript的原生JSON.stringify
转换:
obj = {
"user":'180111',
"title":'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n',
"date":'2007/01/10 19:48:38',
"id":"3322121",
"previd":"112211",
"body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \*/ :\/",
"from":"112221",
"username":"mikethunder",
"creationdate":"2007\/01\/10 14:04:49"
}
console.log(JSON.stringify(obj));
document.write(JSON.stringify(obj));
请记住,您获得的字符串(或对象)无效 JSON 并且无法使用 JSON 库进行解析。它需要首先转换为有效的JSON。但是它是有效的 JavaScript.
要完成此答案:您可以在 .Net 中使用 JavaScriptSerializer
。对于此解决方案,您需要以下程序集:
- System.Net
System.Web.Script.Serialization
var webClient = new WebClient(); string readHtml = webClient.DownloadString("uri to your source (extraterrestrial)"); var a = new JavaScriptSerializer(); Dictionary<string, object> results = a.Deserialize<Dictionary<string, object>>(readHtml);
这个怎么样:
string AlienJSON = "your alien JSON";
JavaScriptSerializer js = new JavaScriptSerializer();
string ProperJSON = js.Serialize(js.DeserializeObject(AlienJSON));
或者只是在反序列化后使用对象,而不是将其转换回字符串并将其传递给 JSON 解析器,这会让人头疼
正如 Mouser 还提到的,您需要在项目中使用 System.Web.Script.Serialization which is available by including system.web.extensions.dll,为此您需要将项目属性中的 Target 框架更改为 .NET Framework 4
。
编辑
消费反序列化对象的技巧是使用dynamic
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic obj = js.DeserializeObject(AlienJSON);
for JSON 在你的问题中只需使用
string body = obj["body"];
或者如果您的 JSON 是数组
if (obj is Array) {
foreach(dynamic o in obj){
string body = obj[0]["body"];
// ... do something with it
}
}
这是我制作的一个功能,可以修复损坏的 json:
function fixJSON(json){
function bulkRegex(str, callback){
if(callback && typeof callback === 'function'){
return callback(str);
}else if(callback && Array.isArray(callback)){
for(let i = 0; i < callback.length; i++){
if(callback[i] && typeof callback[i] === 'function'){
str = callback[i](str);
}else{break;}
}
return str;
}
return str;
}
if(json && json !== ''){
if(typeof json !== 'string'){
try{
json = JSON.stringify(json);
}catch(e){return false;}
}
if(typeof json === 'string'){
json = bulkRegex(json, false, [
str => str.replace(/[\n\t]/gm, ''),
str => str.replace(/,\}/gm, '}'),
str => str.replace(/,\]/gm, ']'),
str => {
str = str.split(/(?=[,\}\]])/g);
str = str.map(s => {
if(s.includes(':') && s){
let strP = s.split(/:(.+)/, 2);
strP[0] = strP[0].trim();
if(strP[0]){
let firstP = strP[0].split(/([,\{\[])/g);
firstP[firstP.length-1] = bulkRegex(firstP[firstP.length-1], false, p => p.replace(/[^A-Za-z0-9\-_]/, ''));
strP[0] = firstP.join('');
}
let part = strP[1].trim();
if((part.startsWith('"') && part.endsWith('"')) || (part.startsWith('\'') && part.endsWith('\'')) || (part.startsWith('`') && part.endsWith('`'))){
part = part.substr(1, part.length - 2);
}
part = bulkRegex(part, false, [
p => p.replace(/(["])/gm, '\'),
p => p.replace(/\'/gm, '\''),
p => p.replace(/\`/gm, '`'),
]);
strP[1] = ('"'+part+'"').trim();
s = strP.join(':');
}
return s;
});
return str.join('');
},
str => str.replace(/(['"])?([a-zA-Z0-9\-_]+)(['"])?:/g, '"":'),
str => {
str = str.split(/(?=[,\}\]])/g);
str = str.map(s => {
if(s.includes(':') && s){
let strP = s.split(/:(.+)/, 2);
strP[0] = strP[0].trim();
if(strP[1].includes('"') && strP[1].includes(':')){
let part = strP[1].trim();
if(part.startsWith('"') && part.endsWith('"')){
part = part.substr(1, part.length - 2);
part = bulkRegex(part, false, p => p.replace(/(?<!\)"/gm, ''));
}
strP[1] = ('"'+part+'"').trim();
}
s = strP.join(':');
}
return s;
});
return str.join('');
},
]);
try{
json = JSON.parse(json);
}catch(e){return false;}
}
return json;
}
return false;
}