如何使用 Hibernate Lucene Search 进行不区分大小写的排序?
How to do case insensitive sorting using Hibernate Lucene Search?
我可以使用以下代码获得结果,但结果排序不正确。它先显示小写字符,然后显示大写字符。
获得的结果:
upper
test
UPPER
Test
预期结果;
upper
UPPER
Test
test
pattern 可以是大写字母 (T) 在前,小写字母 (T) 在后。
以下代码供参考:
普拉达 - 实体 Class:
@Entity
@Table(name = "Prada")
@XmlRootElement
@Indexed
@AnalyzerDef(name="customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory=ISOLatin1AccentFilterFactory.class),
@TokenFilterDef(factory=LowerCaseFilterFactory.class)})
public class Prada implements Serializable {
private static final long serialVersionUID = 1L;
@Id
@Basic(optional = false)
@Column(name = "ID")
private Long id;
@Fields({ @Field(index = Index.YES, store = Store.NO), @Field(name = "PradaName_for_sort", index = Index.YES, analyzer = @Analyzer(definition = "customanalyzer")) })
@Column(name = "NAME", length = 100)
private String name;
public Prada () {
}
public Prada (Long id) {
this.id = id;
}
public Prada (Long id) {
this.id = id;
}
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
@Override
public String toString() {
return "com.Prac.Prada[ id=" + id + " ]";
}
}
在某处找到了这个 analyzerDef 解决方案,但对我不起作用。谁能为我提供解决方案?
主要代码:
FullTextEntityManager ftem = Search.getFullTextEntityManager(factory.createEntityManager());
QueryBuilder qb = ftem.getSearchFactory().buildQueryBuilder().forEntity( Prada.class ).get();
org.apache.lucene.search.Query query = qb.all().getQuery();
FullTextQuery fullTextQuery = ftem.createFullTextQuery(query, Prada.class);
fullTextQuery.setSort(new Sort(new SortField("PradaName_for_sort", SortField.STRING, true)));
fullTextQuery.setFirstResult(0).setMaxResults(150);
int size = fullTextQuery.getResultSize();
List<Prada> result = fullTextQuery.getResultList();
for (Pradauser : result) {
logger.info("Prada Name:" + user.getName());
}
以下是 Lucene 的版本(我无法更改):
<hibernate.version>4.2.8.Final</hibernate.version>
<hibernate.search.version>4.3.0.Final</hibernate.search.version>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>4.2.8.Final</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>3.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers</artifactId>
<version>3.6.2</version>
</dependency>
更新代码:
@AnalyzerDef(name = "customanalyzer",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "('-&\.,\(\))"),
@Parameter(name = "replacement", value = " "),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "([^0-9\p{L} ])"),
@Parameter(name = "replacement", value = ""),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = TrimFilterFactory.class)
}
)
public class Prada implements Serializable {
@Fields({ @Field(index = Index.YES, store = Store.YES), @Field(name = "PradaName_for_sort", index = Index.YES, analyzer = @Analyzer(definition = "customanalyzer")) })
@Column(name = "NAME", length = 100)
private String name;
切勿使用为排序进行分词的分词器。您需要使用 KeywordTokenizer 来确保标记保持原样。
这是我们在我以前的公司用来分选的分析仪:
@AnalyzerDef(name = "TEXT_SORT",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "('-&\.,\(\))"),
@Parameter(name = "replacement", value = " "),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "([^0-9\p{L} ])"),
@Parameter(name = "replacement", value = ""),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = TrimFilterFactory.class)
}
)
它适用于最新版本的 Hibernate Search,因此您需要对其进行调整。显然,你需要一个 s/ASCIIFoldingFilterFactory/ISOLatin1AccentFilterFactory/ 但我不确定 PatternReplaceFilterFactory 是否已经存在于 3.6.2.
我可以使用以下代码获得结果,但结果排序不正确。它先显示小写字符,然后显示大写字符。
获得的结果:
upper
test
UPPER
Test
预期结果;
upper
UPPER
Test
test
pattern 可以是大写字母 (T) 在前,小写字母 (T) 在后。
以下代码供参考:
普拉达 - 实体 Class:
@Entity
@Table(name = "Prada")
@XmlRootElement
@Indexed
@AnalyzerDef(name="customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory=ISOLatin1AccentFilterFactory.class),
@TokenFilterDef(factory=LowerCaseFilterFactory.class)})
public class Prada implements Serializable {
private static final long serialVersionUID = 1L;
@Id
@Basic(optional = false)
@Column(name = "ID")
private Long id;
@Fields({ @Field(index = Index.YES, store = Store.NO), @Field(name = "PradaName_for_sort", index = Index.YES, analyzer = @Analyzer(definition = "customanalyzer")) })
@Column(name = "NAME", length = 100)
private String name;
public Prada () {
}
public Prada (Long id) {
this.id = id;
}
public Prada (Long id) {
this.id = id;
}
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
@Override
public String toString() {
return "com.Prac.Prada[ id=" + id + " ]";
}
}
在某处找到了这个 analyzerDef 解决方案,但对我不起作用。谁能为我提供解决方案?
主要代码:
FullTextEntityManager ftem = Search.getFullTextEntityManager(factory.createEntityManager());
QueryBuilder qb = ftem.getSearchFactory().buildQueryBuilder().forEntity( Prada.class ).get();
org.apache.lucene.search.Query query = qb.all().getQuery();
FullTextQuery fullTextQuery = ftem.createFullTextQuery(query, Prada.class);
fullTextQuery.setSort(new Sort(new SortField("PradaName_for_sort", SortField.STRING, true)));
fullTextQuery.setFirstResult(0).setMaxResults(150);
int size = fullTextQuery.getResultSize();
List<Prada> result = fullTextQuery.getResultList();
for (Pradauser : result) {
logger.info("Prada Name:" + user.getName());
}
以下是 Lucene 的版本(我无法更改):
<hibernate.version>4.2.8.Final</hibernate.version>
<hibernate.search.version>4.3.0.Final</hibernate.search.version>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>4.2.8.Final</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>3.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers</artifactId>
<version>3.6.2</version>
</dependency>
更新代码:
@AnalyzerDef(name = "customanalyzer",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "('-&\.,\(\))"),
@Parameter(name = "replacement", value = " "),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "([^0-9\p{L} ])"),
@Parameter(name = "replacement", value = ""),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = TrimFilterFactory.class)
}
)
public class Prada implements Serializable {
@Fields({ @Field(index = Index.YES, store = Store.YES), @Field(name = "PradaName_for_sort", index = Index.YES, analyzer = @Analyzer(definition = "customanalyzer")) })
@Column(name = "NAME", length = 100)
private String name;
切勿使用为排序进行分词的分词器。您需要使用 KeywordTokenizer 来确保标记保持原样。
这是我们在我以前的公司用来分选的分析仪:
@AnalyzerDef(name = "TEXT_SORT",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "('-&\.,\(\))"),
@Parameter(name = "replacement", value = " "),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "([^0-9\p{L} ])"),
@Parameter(name = "replacement", value = ""),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = TrimFilterFactory.class)
}
)
它适用于最新版本的 Hibernate Search,因此您需要对其进行调整。显然,你需要一个 s/ASCIIFoldingFilterFactory/ISOLatin1AccentFilterFactory/ 但我不确定 PatternReplaceFilterFactory 是否已经存在于 3.6.2.