在另一个数组 js 中查找值数组?

Find array of values in another array js?

你好,我需要阅读包含近 300 000 个单词的文本,并从输入词典中确定每个单词的全局频率并制作一个数组。 我有句子文件和字典文件,其中包含单词及其频率... 这是我的代码:

const sentenceFreq = [];
  let text = [];
  for (const sentence of srcSentences) {
    // remove special characters
    const sentenceWithoutSpecial = sentence.srcLangContent
        .replace(/[`~!@#$%^&*„“()_|+\-=?;:'",.<>\{\}\[\]\\/]/gi, "");
    text = text + sentenceWithoutSpecial + " ";
  }
  const words = text.replace(/[.]/g, "").split(/\s/);
  words.map(async (w, i)=>{
    const frequency = eng.filter((x) => x.word.toLowerCase() === w.toLowerCase());
    if (frequency[0]) {
      sentenceFreq.push({[frequency[0].freq]: w});
    } else {
      sentenceFreq.push({0: w});
    }
  });

这是英文词典

let eng = [
    {word:"the",freq:23135851162},
    {word:"of",freq:13151942776},
    {word:"and",freq:12997637966},
    {word:"to",freq:12136980858},
    {word:"a",freq:9081174698},
    {word:"in",freq:8469404971}
....]

因此,如果我的文本是“今天是美好的一天”,代码应该搜索每个单词,在 eng 词典中找到它,return 它的频率,所以结果将是 [{1334:"today"},{521:"is"},{678854:"beautiful"},{9754334:"day"}]

所以这个数字 1334,521... 是在 eng 词典中找到的频率。

问题是这太慢了,因为我有 300 000 个单词......是读取单词数组并在文件英语单词数组中找到它的更有效方法...... 所以如果我有数组 ['today', 'is', 'good', 'day'] 我可以自动搜索 eng 数组中的所有值而不是使用循环遍历每个单词吗?

与其使用像 [ {word1: "text", frequency: 4} ] 这样的对象数组进行查找,不如尝试创建一个对象,其中 属性 名称是单词,计数是它们的频率。然后你可以将你的单词数组映射到最终输出:

const myString = "Today is beautiful day. I like to walk and go in the forest.";
const cleanText = myString.replace(/[`~!@#$%^&*„“()_|+\-=?;:'",.<>\{\}\[\]\\/]/gi, "");

const eng = [
    {word:"the",freq:23135851162},
    {word:"of",freq:13151942776},
    {word:"and",freq:12997637966},
    {word:"to",freq:12136980858},
    {word:"a",freq:9081174698},
    {word:"in",freq:8469404971}
];
const myEng = eng.reduce((obj, {word, freq}) => { // reduce all the values in the "eng" array to a single object
    obj[word] = freq; // assuming there are no duplicates, each word should have a new entry
    return obj; // return the object for the next iteration to use
  }, {} // these brackets here are the "obj" value on the first loop, in this case an empty object
);
console.log("New object with fast lookup:\n", myEng);

const wordsArr = cleanText.split(" ");
const out = wordsArr.map((word) => {
  const freq = myEng[word] || 0; // freq = myEng[word] if it exists, else 0
  return { [freq]: word }; // replace the word in the array with an object in format { frequency : word }
});
console.log("Output:\n", out);
.as-console-wrapper { min-height: 100% } /* Change the console output size. */

这会大大加快速度,因为您要检查的每个单词的查找时间都会减少。