Flexsearch 导出和导入文档索引问题

Flexsearch export and import document index issue

我正在尝试使用 flexsearch 和 nodejs 构建索引并将其存储在本地磁盘上,因为构建它需要相当多的时间。导出似乎有效,但是当尝试使用新文档索引再次导入文件时,出现此错误:

TypeError: Cannot read property 'import' of undefined
at Q.t.import (/opt/hermetic/hermetic/server/node_modules/flexsearch/dist/flexsearch.bundle.js:33:330)
at Object.retrieveIndex (/opt/hermetic/hermetic/server/build/search.js:86:25)
at Object.search (/opt/hermetic/hermetic/server/build/search.js:96:32)
at init (/opt/hermetic/hermetic/server/build/server.js:270:27)

我是 运行 nodejs 版本 14 和 flexsearch 版本 0.7.21。下面是我使用的代码:

import fs from 'fs';
import Flexsearch from 'flexsearch';

const createIndex = async () => { 
    const { Document } = Flexsearch;
    const index = new Document({
      document: {
        id: 'id',
        tag: 'tag',
        store: true,
        index: [
          'record:a',
          'record:b',
          'tag',
        ],
      },
    });

    index.add({ id: 0, tag: 'category1', record: { a: '1 aaa', b: '0 bbb' } });
    index.add({ id: 1, tag: 'category1', record: { a: '1 aaa', b: '1 bbb' } });
    index.add({ id: 2, tag: 'category2', record: { a: '2 aaa', b: '2 bbb' } });
    index.add({ id: 3, tag: 'category2', record: { a: '2 aaa', b: '3 bbb' } });
    console.log('search', index.search('aaa'));

    await index.export((key, data) => fs.writeFile(`./search_index/${key}`, data, err => !!err && console.log(err)));
    return true;
}

const retrieveIndex = async () => { 
    const { Document } = Flexsearch;
    const index = new Document({
      document: {
        id: 'id',
        tag: 'tag',
        store: true,
        index: [
          'record:a',
          'record:b',
          'tag',
        ],
      },
    });

    const keys = fs
      .readdirSync('./search_index', { withFileTypes: true }, err => !!err && console.log(err))
      .filter(item => !item.isDirectory())
      .map(item => item.name);

    for (let i = 0, key; i < keys.length; i += 1) {
      key = keys[i];
      const data = fs.readFileSync(`./search_index/${key}`, 'utf8');
      index.import(key, data);
    }
    return index;
}

await createIndex();
const index = await retrieveIndex();

console.log('cached search', index.search('aaa'));

进一步调查后,该功能目前不适用于文档类型搜索。 See this issue in github for more information

我也在尝试找到一种正确导出索引的方法,最初是尝试将所有内容放入一个文件中。 While it worked, I didn't really like the solution.

这让我想到了你的 SO 问题,我已经检查了你的代码并设法找出了你出现该错误的原因。

基本上导出是同步操作,而您也(随机)使用异步。为了避免这个问题,您需要删除所有异步代码并只使用同步 node.fs 操作。对于我的解决方案,我也只创建了一次文档存储,然后通过 retrieveIndex() 填充它,而不是为每个函数创建 new Document()

我还添加了一个 .json 扩展名,以保证 node.fs 正确读取文件并保持理智 - 毕竟它已 json 存储。

所以感谢你给我将每个 key 存储为文件的想法@Jamie Nicholls

import fs from 'fs';
import { Document } from 'flexsearch'

const searchIndexPath = '/Users/user/Documents/linked/search-index/'

  let index = new Document({
    document: {
      id: 'date',
      index: ['content']
    },
    tokenize: 'forward'
  })


const createIndex = () => { 
  
  index.add({ date: "2021-11-01", content: 'asdf asdf asd asd asd asd' })
  index.add({ date: "2021-11-02", content: 'fobar 334kkk' })
  index.add({ date: "2021-11-04", content: 'fobar 234 sffgfd' })

  index.export(
    (key, data) => fs.writeFileSync(`${searchIndexPath}${key}.json`, data !== undefined ? data : '')
  )
}

createIndex()

const retrieveIndex = () => { 

  const keys = fs
    .readdirSync(searchIndexPath, { withFileTypes: true })
    .filter(item => !item.isDirectory())
    .map(item => item.name.slice(0, -5))

  for (let i = 0, key; i < keys.length; i += 1) {
    key = keys[i]
    const data = fs.readFileSync(`${searchIndexPath}${key}.json`, 'utf8')
    index.import(key, data ?? null)
  }
}


const searchStuff = () => {
  retrieveIndex()  
  console.log('cached search', index.search('fo'))
}

searchStuff()