OrientDB 对 lucene 搜索的错误查询结果

OrientDB incorrect query result against lucene search

我在使用 OrietDB Lucene 索引时遇到问题。当我使用它进行查询时,它 return 是一个不完整的数据集。这是示例:

create class Foo extends V
create property Foo.text string
create index Foo.text_spanish on Foo(text) fulltext engine lucene metadata 
        { "analyzer": "org.apache.lucene.analysis.es.SpanishAnalyzer", 
          "index": "org.apache.lucene.analysis.es.SpanishAnalyzer", 
          "query": "org.apache.lucene.analysis.es.SpanishAnalyzer", 
          "allowLeadingWildcard": true             
}

insert into Foo (text) values ("axxx")
insert into Foo (text) values ("áxxx")
insert into Foo (text) values ("xxxa")
insert into Foo (text) values ("xxxá")
insert into Foo (text) values ("xxaxx")
insert into Foo (text) values ("xxáxx")

现在当我 运行 这个查询时:

select from Foo where text lucene "*a*"

我得到:

xxáxx
xxaxx
xxxa
axxx

它错过了

áxxx
xxxá

如果我 运行 这个:

select from Foo where text lucene "*á*"

我得到:

áxxx
xxxá

并想念剩下的。即使在这种情况下,它也应该显示 xxáxx。 我做错了什么?

默认情况下,OrientDB 支持列出的所有分析器 here, however there are characters that are not considered "Basic Latin" and are available only when creating a custom analyzer with supported filters, such as ASCIIFoldingFilter

创建并编译 class 后,将其 .jar 导入 OrientDB 的 lib 目录,然后使用自定义分析器创建索引。

同时,一个快速的解决方案是:

SELECT FROM Foo WHERE text LUCENE "*a*" OR text LUCENE "*á*";