Lucene.net: 使用 MultiFieldQueryParser 时没有搜索结果
Lucene.net: No search results when using MultiFieldQueryParser
我正在使用 Lucene.Net 版本 4.8.0 和 .NET Core 3.1。在下面的示例代码中,我编写了一个新索引并向其中添加了三个文档。每个文档包含字段 ProjectName
、Customer
和 Country
.
当我解析查询 "Germany"
时,搜索 returns 2 次匹配在其任何字段中包含单词 "Germany" 的文档。
但是,当我解析查询 "Country:Germany"
时,搜索 returns 0 命中,尽管在其 Country
字段中显然有一个值为 "Germany" 的文档。
我做错了什么?
我还通过 Luke 工具 (https://github.com/DmitryKey/luke/releases/tag/4.8.0) 检查了我的索引。使用 Luke,搜索可以很好地搜索相同的索引目录。
我的 C# 代码在这里:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers.Classic;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Util;
using Directory = System.IO.Directory;
namespace LuceneTestApp
{
class Program
{
static string CreateTestIndex()
{
string indexDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());
if(Directory.Exists(indexDir))
throw new IOException("Random index directory already exists. Please try again.");
using var dir = FSDirectory.Open(indexDir);
using var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
var indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
using var writer = new IndexWriter(dir, indexConfig);
AddDocumentTo(writer, "AwesomeProject_1", "Volkswagen", "Germany");
AddDocumentTo(writer, "AwesomeProject_2", "Ford", "USA");
AddDocumentTo(writer, "AwesomeProject_3", "Audi Germany", "France");
writer.Commit();
return indexDir;
}
static void AddDocumentTo(IndexWriter writer, string projectName, string customer, string country)
{
var doc = new Document();
doc.Add(new StringField( "ProjectName", projectName, Field.Store.YES));
doc.Add(new TextField( "Customer", customer, Field.Store.YES));
doc.Add(new TextField( "Country", country, Field.Store.YES));
writer.AddDocument(doc);
}
static IList<string> Search(string indexDir, string queryString)
{
using var dir = FSDirectory.Open(indexDir);
using var reader = DirectoryReader.Open(dir);
using var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
var searcher = new IndexSearcher(reader);
string[] searchFields = {"ProjectName", "Customer", "Country"};
var queryParser = new MultiFieldQueryParser(LuceneVersion.LUCENE_48, searchFields, analyzer);
queryParser.DefaultOperator = Operator.AND;
var query = queryParser.Parse(queryString.ToLowerInvariant());
int maxNumHits = 10;
var topDocs = searcher.Search(query, maxNumHits);
return topDocs.ScoreDocs.Select(hit => $"Score {hit.Score,5:0.000} DocId {hit.Doc}").ToList();
}
static void Main(string[] args)
{
Console.WriteLine("=================");
string indexDir = CreateTestIndex();
IList<string> hitsOne = Search(indexDir, "Germany");
IList<string> hitsTwo = Search(indexDir, "Country:Germany");
Console.WriteLine($"Search one yields {hitsOne.Count} hits.");
Console.WriteLine($"Search two yields {hitsTwo.Count} hits.");
Console.WriteLine("=================\n\n");
}
}
}
我发现了我犯的错误。显然,Lucene 中的字段名称是区分大小写的。因此,在我的例子中,代码 queryString.ToLowerInvariant()
将查询字符串 "Country:Germany"
转换为 "country:Germany"
,因此没有找到任何内容,因为小写字段 country
不存在。
解决方案:删除对 ToLowerInvariant()
.
的调用
我正在使用 Lucene.Net 版本 4.8.0 和 .NET Core 3.1。在下面的示例代码中,我编写了一个新索引并向其中添加了三个文档。每个文档包含字段 ProjectName
、Customer
和 Country
.
当我解析查询 "Germany"
时,搜索 returns 2 次匹配在其任何字段中包含单词 "Germany" 的文档。
但是,当我解析查询 "Country:Germany"
时,搜索 returns 0 命中,尽管在其 Country
字段中显然有一个值为 "Germany" 的文档。
我做错了什么?
我还通过 Luke 工具 (https://github.com/DmitryKey/luke/releases/tag/4.8.0) 检查了我的索引。使用 Luke,搜索可以很好地搜索相同的索引目录。
我的 C# 代码在这里:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers.Classic;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Util;
using Directory = System.IO.Directory;
namespace LuceneTestApp
{
class Program
{
static string CreateTestIndex()
{
string indexDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());
if(Directory.Exists(indexDir))
throw new IOException("Random index directory already exists. Please try again.");
using var dir = FSDirectory.Open(indexDir);
using var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
var indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
using var writer = new IndexWriter(dir, indexConfig);
AddDocumentTo(writer, "AwesomeProject_1", "Volkswagen", "Germany");
AddDocumentTo(writer, "AwesomeProject_2", "Ford", "USA");
AddDocumentTo(writer, "AwesomeProject_3", "Audi Germany", "France");
writer.Commit();
return indexDir;
}
static void AddDocumentTo(IndexWriter writer, string projectName, string customer, string country)
{
var doc = new Document();
doc.Add(new StringField( "ProjectName", projectName, Field.Store.YES));
doc.Add(new TextField( "Customer", customer, Field.Store.YES));
doc.Add(new TextField( "Country", country, Field.Store.YES));
writer.AddDocument(doc);
}
static IList<string> Search(string indexDir, string queryString)
{
using var dir = FSDirectory.Open(indexDir);
using var reader = DirectoryReader.Open(dir);
using var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
var searcher = new IndexSearcher(reader);
string[] searchFields = {"ProjectName", "Customer", "Country"};
var queryParser = new MultiFieldQueryParser(LuceneVersion.LUCENE_48, searchFields, analyzer);
queryParser.DefaultOperator = Operator.AND;
var query = queryParser.Parse(queryString.ToLowerInvariant());
int maxNumHits = 10;
var topDocs = searcher.Search(query, maxNumHits);
return topDocs.ScoreDocs.Select(hit => $"Score {hit.Score,5:0.000} DocId {hit.Doc}").ToList();
}
static void Main(string[] args)
{
Console.WriteLine("=================");
string indexDir = CreateTestIndex();
IList<string> hitsOne = Search(indexDir, "Germany");
IList<string> hitsTwo = Search(indexDir, "Country:Germany");
Console.WriteLine($"Search one yields {hitsOne.Count} hits.");
Console.WriteLine($"Search two yields {hitsTwo.Count} hits.");
Console.WriteLine("=================\n\n");
}
}
}
我发现了我犯的错误。显然,Lucene 中的字段名称是区分大小写的。因此,在我的例子中,代码 queryString.ToLowerInvariant()
将查询字符串 "Country:Germany"
转换为 "country:Germany"
,因此没有找到任何内容,因为小写字段 country
不存在。
解决方案:删除对 ToLowerInvariant()
.