在不同索引上组合休眠搜索查询的最佳方法

Best way to combined hibernate search queries on different indexes

我们有以下情况

给定以下两个实体

@Indexed
@Spatial(spatialMode = SpatialMode.HASH)
@Entity
@Table(name = "address")
Address{

    @Field
    @Basic
    @Column(name = "state")
    private String state;

    @Field
    @Basic
    @Column(name = "town_city")
    private String townCity;

    @Field
    @Longitude
    @Basic
    @Column(name = "x_coord")
    private Double xCoord;

    @Field
    @Latitude
    @Basic
    @Column(name = "y_coord")
    private Double yCoord;

}

@Indexed
@Entity
@Table(name = "person")
Person{

    @Field
    @Column(name = "weight")
    private Double weight;

    @Column(name = "age")
    private Integer age;


    @org.hibernate.annotations.Cache(usage = 
    org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE)
    @ManyToMany
    @Cascade({org.hibernate.annotations.CascadeType.SAVE_UPDATE})
    @JoinTable(name = "person_address",
        joinColumns = {@JoinColumn(name = "person_id")},
        inverseJoinColumns = {@JoinColumn(name = "address_id")})
    private Set<Address> addressSet = new HashSet<>();

}

Getters 和 Setters 其余字段省略

我们希望在我们的搜索结果中 return 作为给定位置 5 公里半径范围内年龄范围内的人的示例。

所以

    FullTextSession fullTextSession = Search.getFullTextSession(entityManagerFactory.unwrap(SessionFactory.class).openSession());
        this.queryBuilder = fullTextSession.getSearchFactory()
            .buildQueryBuilder().forEntity(Person.class)
            .overridesForField("identifiers.identifier_edge", "identifier_query_analyzer")
            .get();
        this.bool = queryBuilder.bool();


            LocalDateTime lowerLocalDateTime = localDateTime.withYear(localDateTime.getYear() - upperAge);
            lowerDate = Date.from(lowerLocalDateTime.atZone(ZoneId.systemDefault()).toInstant());

            LocalDateTime upperLocalDateTime = localDateTime.withYear(localDateTime.getYear() - lowerAge);
            upperDate = Date.from(upperLocalDateTime.atZone(ZoneId.systemDefault()).toInstant());
            bool.must(getQueryBuilder().range().onField("datesOfBirth.dateOfBirth").from(lowerDate).to(upperDate).createQuery());

这将为我们提供相关年龄范围内的人

我们有一个单独的查询来获取给定点周围半径范围内的地址 ID

public Set<Integer> getSpatialAddressResults(SpatialSearchCommand spatialSearchCommand) {

FullTextSession fullTextSession = Search.getFullTextSession(entityManagerFactory.unwrap(SessionFactory.class).openSession());
    this.userSearchPreference = userSearchPreference;
    this.queryBuilder = fullTextSession.getSearchFactory()
            .buildQueryBuilder().forEntity(Address.class)
            .get();
    this.bool = queryBuilder.bool();

    Set<Integer> addressIdSet = new HashSet<>();

    bool.must(getQueryBuilder().spatial()
            .within(spatialSearchCommand.getRadius(), Unit.KM).ofLatitude
                    (spatialSearchCommand.getLat()).andLongitude(spatialSearchCommand.getLng()).createQuery());


    FullTextQuery fullTextQuery =
            fullTextSession.createFullTextQuery(bool.createQuery(), Address.class)
                    .setProjection("addressId")
                    .initializeObjectsWith(ObjectLookupMethod.SECOND_LEVEL_CACHE,
                            DatabaseRetrievalMethod.QUERY);

    List results = fullTextQuery.list();
    for (Object result : results) {
        Object[] arrayResult = (Object[]) result;
        addressIdSet.add(((Integer) arrayResult[0]));
    }

    if (addressIdSet.size() == 0) {
        addressIdSet.add(-1);
    }


    return addressIdSet;

}

我们像下面这样使用(实际上这些是单独完成的 类 但为了简单起见,我只显示了相关代码

Set<Integer> localAddressIds = getSpatialAddressResults(new SpatialSearchCommand(userSearchPreference.getRadius(), userSearchPreference.getLat(), userSearchPreference.getLng()));

if(localAddressIds.size() > 0){
        BooleanJunction<BooleanJunction> localSquQueryBool = getQueryBuilder().bool();

        for (Integer localAddressId : localAddressIds) {
            localSquQueryBool.should(getQueryBuilder().keyword().onField("currentLocation.address.indexId").matching(localAddressId).createQuery());

            if(!personSearchCommand.getCurrentOnly()){
                localSquQueryBool.should(getQueryBuilder().keyword().onField("locations.address.indexId").matching(localAddressId).createQuery());
            }

        }

        bool.must(localSquQueryBool.createQuery());
    }

问题是可能有大量地址 returned 导致 BooleanQueryTooManyClauses:maxClauseCount 设置为 1024

真正的问题是在两个不同的索引实体上组合查询以避免上述问题的最佳方式是什么。

本质上,您是在尝试实现一个连接操作。如您所见,连接存在技术挑战,在客户端不容易解决。

通常,Elasticsearch 和 Lucene 中推荐的方法是尽可能避免连接。相反,您将去规范化您的架构:在代表每个人的文档中,嵌入每个地址的副本。然后,您将能够在针对 person 索引的单个查询中表达所有约束。 这是通过用 @IndexedEmbedded.

注释 Person 中的 addresses 属性 来完成的

现在,您可以想象,这种反规范化是有代价的:每当地址发生变化时,Hibernate Search 都必须更新相关人员。 为此,您需要将 List<Person> 属性 添加到 Address class 并使用 @ContainedIn 对其进行注释,以便 Hibernate Search 能够每当地址被修改时获取人员重新索引。

简而言之,将您的模型更改为:

//@Indexed // No longer needed
@Spatial(spatialMode = SpatialMode.HASH, name = "location") // Give a name to the spatial field
@Entity
@Table(name = "address")
Address {
    // Add this
    @ManyToMany(mappedBy = "addressSet")
    @ContainedIn
    private Set<Person> personSet = new HashSet<>();

    @Field
    @Basic
    @Column(name = "state")
    private String state;

    @Field
    @Basic
    @Column(name = "town_city")
    private String townCity;

    //@Field// This is not necessary
    @Longitude
    @Basic
    @Column(name = "x_coord")
    private Double xCoord;

    //@Field// This is not necessary
    @Latitude
    @Basic
    @Column(name = "y_coord")
    private Double yCoord;

}
@Indexed
@Entity
@Table(name = "person")
Person{

    @Field
    @Column(name = "weight")
    private Double weight;

    @Column(name = "age")
    private Integer age;


    @org.hibernate.annotations.Cache(usage = 
    org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE)
    @ManyToMany
    @Cascade({org.hibernate.annotations.CascadeType.SAVE_UPDATE})
    @JoinTable(name = "person_address",
        joinColumns = {@JoinColumn(name = "person_id")},
        inverseJoinColumns = {@JoinColumn(name = "address_id")})
    @IndexedEmbedded // Add this
    private Set<Address> addressSet = new HashSet<>();

    @Transient
    @IndexedEmbedded // Also add this
    public Address getCurrentAddress() {
         // This was missing in your schema, I suppose it's a getter that picks the current address from addressSet?
    }

}

然后重建索引。您的 Person 文档现在将有两个新字段:addressSet.locationcurrentAddress.location

然后这样写你的查询:

    FullTextSession fullTextSession = Search.getFullTextSession(entityManagerFactory.unwrap(SessionFactory.class).openSession());
        this.queryBuilder = fullTextSession.getSearchFactory()
            .buildQueryBuilder().forEntity(Person.class)
            .overridesForField("identifiers.identifier_edge", "identifier_query_analyzer")
            .get();
        this.bool = queryBuilder.bool();


            LocalDateTime lowerLocalDateTime = localDateTime.withYear(localDateTime.getYear() - upperAge);
            lowerDate = Date.from(lowerLocalDateTime.atZone(ZoneId.systemDefault()).toInstant());

            LocalDateTime upperLocalDateTime = localDateTime.withYear(localDateTime.getYear() - lowerAge);
            upperDate = Date.from(upperLocalDateTime.atZone(ZoneId.systemDefault()).toInstant());
            bool.must(getQueryBuilder().range().onField("datesOfBirth.dateOfBirth").from(lowerDate).to(upperDate).createQuery());


SpatialSearchCommand spatialSearchCommand = new SpatialSearchCommand(userSearchPreference.getRadius(), userSearchPreference.getLat(), userSearchPreference.getLng());

// The magic happens below
BooleanJunction<BooleanJunction> localSquQueryBool = getQueryBuilder().bool();

localSquQueryBool.should(getQueryBuilder().spatial()
        .onField("currentAddress.location")
        .within(spatialSearchCommand.getRadius(), Unit.KM)
        .ofLatitude(spatialSearchCommand.getLat())
        .andLongitude(spatialSearchCommand.getLng())
        .createQuery());

if(!personSearchCommand.getCurrentOnly()) {
    localSquQueryBool.should(getQueryBuilder().spatial()
            .onField("addressSet.location")
            .within(spatialSearchCommand.getRadius(), Unit.KM)
            .ofLatitude(spatialSearchCommand.getLat())
            .andLongitude(spatialSearchCommand.getLng())
            .createQuery());
}

bool.must(localSquQueryBool.createQuery());