使用lucene有条件地为不同的用户搜索不同的数据

Question

考虑到我需要执行文本搜索的实体如下

Sample{
    int ID, //Unique ID
    string Name,//Searchable field
    string Description //Searchable field
}

现在，我有几个这样的实体，它们通常由所有用户共享，但每个用户都可以将不同的标签、注释等关联到这些实体中的任何一个。为简单起见，假设用户可以向样本实体添加标签。

UserSampleData{
    int ID, //Sample ID
    int UserID, //For condition
    string tags //Searchable field
}

当用户执行搜索时，我想在当前用户与该示例关联的名称、描述和标签字段中搜索给定字符串。我对使用 lucene 索引还很陌生，我不知道如何设计索引以及针对这种情况的查询。我需要根据与搜索查询的相关性对结果进行排序。我想到了以下方法，但我觉得可能会有更好的解决方案：

分别查询 2 个不同的实体 Samples 和 UserSampleData 并以某种方式混合 2 个结果。对于相交的结果，我们需要通过可能平均来组合匹配分数。
1. 通过合并两个实体 => 同一 ID 的多个条目来展平数据。

Answer 1

您可以使用 JoinUtil Lucene class 但您必须将 UserDataSample 文档的第二个“ID”字段重命名为 SAMPLE_ID（或另一个名称不同于“ID”）。下面是一个例子：

  r = DirectoryReader.open(dir);
  final Version version = Version.LUCENE_47; // Your lucene version
  final IndexSearcher searcher = new IndexSearcher(r);

  final String fromField = "ID";
  final boolean multipleValuesPerDocument = false;
  final String toField = "SAMPLE_ID";
  String querystr = "UserID:xxxx AND yourQueryString"; //the userID condition and your query String

  Query fromQuery = new QueryParser(version, "NAME", new WhitespaceAnalyzer(version)).parse(querystr);
  final Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, searcher, ScoreMode.None);

  final TopDocs topDocs = searcher.search(joinQuery, 10);

检查错误 https://issues.apache.org/jira/browse/LUCENE-4824)。我不知道该错误是否已自动解决到当前版本的 LUCENE 中，否则我认为您必须将 ID 字段的类型转换为 String。

Answer 2

我认为您需要关系数据。使用 Lucene 处理关系数据并不简单。 This 是一个有用的博客 post。

使用lucene有条件地为不同的用户搜索不同的数据

Using lucene to search data differently for different users conditionally

lucene

lucene.net

full-text-search

full-text-indexing