在 javascript 中优化 JSON 查询性能

optimizing JSON querying performance in javascript

我有一个 10MB JSON 的文件,其结构如下(10k 个条目):

{
entry_1: {
    description: "...",
    offset: "...",
    value: "...",
    fields: {
        field_1: {
            offset: "...",
            description: "...",
        },
        field_2: {
            offset: "...",
            description: "...",
        }   
    }
},
entry_2:
...
...
...

}

我想实现一个自动完成输入字段,该字段将在搜索多个属性时尽快从此文件中获取建议。 例如,查找所有包含某些子字符串的条目名称、字段名称和描述。

方法一:

我试图将嵌套展平为字符串数组:

"entry_1|descrption|offset|value|field1|offset|description",
"entry_1|descrption|offset|value|field2|offset|description",
"entry2|..."

并执行不区分大小写的部分字符串匹配,查询耗时约900ms。

方法二

我尝试了基于 Xpath 的 JSON 查询(使用 defiant.js)。

  var snapshot = Defiant.getSnapshot(DATA);
  found = JSON.search(snapshot, '//*[contains(fields, "substring")]');

查询花费了大约 600 毫秒(仅针对单个属性,fields)。

还有其他选项可以让我达到 100 毫秒以下吗?我可以控制文件格式,因此我可以将其转换为 XML 或任何其他格式,唯一的要求是速度。

由于您正在尝试搜索值的子字符串,因此按照建议使用 indexeddb 并不是一个好主意。您可以尝试将字段的值展平为文本,其中字段由 :: 分隔并且对象中的每个键都是文本文件中的一行:

{
  key1:{
    one:"one",
    two:"two",
    three:"three"
  },
  key2:{
    one:"one 2",
    two:"two 2",
    three:"three 2"
  }
}

将是:

key1::one::two::three
key2::one 2::two 2::three

然后使用正则表达式搜索 keyN:: 部分之后的文本并存储所有匹配的键。然后将所有这些键映射到对象。所以如果 key1 是你唯一的匹配项 return [data.key1]

这是一个包含 10000 个键的样本数据的示例(在笔记本电脑上搜索需要几毫秒,但在限制到移动设备时尚未测试):

//array of words, used as value for data.rowN
const wordArray = ["actions","also","amd","analytics","and","angularjs","another","any","api","apis","application","applications","are","arrays","assertion","asynchronous","authentication","available","babel","beautiful","been","between","both","browser","build","building","but","calls","can","chakra","clean","client","clone","closure","code","coherent","collection","common","compiler","compiles","concept","cordova","could","created","creating","creation","currying","data","dates","definition","design","determined","developed","developers","development","difference","direct","dispatches","distinct","documentations","dynamic","easy","ecmascript","ecosystem","efficient","encapsulates","engine","engineered","engines","errors","eslint","eventually","extend","extension","falcor","fast","feature","featured","fetching","for","format","framework","fully","function","functional","functionality","functions","furthermore","game","glossary","graphics","grunt","hapi","has","having","help","helps","hoisting","host","how","html","http","hybrid","imperative","include","incomplete","individual","interact","interactive","interchange","interface","interpreter","into","its","javascript","jquery","jscs","json","kept","known","language","languages","library","lightweight","like","linked","loads","logic","majority","management","middleware","mobile","modular","module","moment","most","multi","multiple","mvc","native","neutral","new","newer","nightmare","node","not","number","object","objects","only","optimizer","oriented","outside","own","page","paradigm","part","patterns","personalization","plugins","popular","powerful","practical","private","problem","produce","programming","promise","pure","refresh","replace","representing","requests","resolved","resources","retaining","rhino","rich","run","rxjs","services","side","simple","software","specification","specifying","standardized","styles","such","support","supporting","syntax","text","that","the","their","they","toolkit","top","tracking","transformation","type","underlying","universal","until","use","used","user","using","value","vuejs","was","way","web","when","which","while","wide","will","with","within","without","writing","xml","yandex"];
//get random number
const rand = (min,max) =>
  Math.floor(
    (Math.random()*(max-min))+min
  )
;
//return object: {one:"one random word from wordArray",two:"one rand...",three,"one r..."}
const threeMembers = () =>
  ["one","two","three"].reduce(
    (acc,item)=>{
      acc[item] = wordArray[rand(0,wordArray.length)];
      return acc;
    }
    ,{}
  )
;
var i = -1;
data = {};
//create data: {row0:threeMembers(),row1:threeMembers()...row9999:threeMembers()}
while(++i<10000){
  data[`row${i}`] = threeMembers();
}
//convert the data object to string "row0::word::word::word\nrow1::...\nrow9999..."
const dataText = Object.keys(data)
  .map(x=>`${x}::${data[x].one}::${data[x].two}::${data[x].three}`)
  .join("\n")
;
//search for someting (example searching for "script" will match javascript and ecmascript)
//  i in the regexp "igm" means case insensitive
//return array of data[matched key]
window.searchFor = search => {
  const r = new RegExp(`(^[^:]*).*${search}`,"igm")
  ,ret=[];
  var result = r.exec(dataText);
  while(result !== null){
    ret.push(result[1]);
    result = r.exec(dataText);
  }
  return ret.map(x=>data[x]);
};
//example search for "script"
console.log(searchFor("script"));