使用 Solr 5.1.0 处理 MySQL 的多个表？

Question

我的 MySQL 数据库中有超过 30 个 table。最近我使用 DataImporthandler 将数据从我的 1 table 导入到 Solr 5.1.0，在我的 data-config.xml 文件中，触发查询

select * from table-name

但在我的搜索中，我必须整合超过 10 个 table 才能给出正确的搜索结果。

方法是

1) 在MySQL数据库中使用JOIN查询导入数据并导入

或

2) JOIN solr cores 通过单独导入完整数据 tables.

我应该怎么做才能让它优化？？哪个是好方法？

Answer 1

如果你有一个单核，那么我建议将 tables 导入一个单核并使用 joins.That 是我在我的 solr 4.9 上用 cake php 和solrphp客户端。但是为此，您必须在 data-config.xml 和 schema.xml.Which 中定义 table 结构和数据类型，我想您一定已经完成了。在您的数据配置文件中，您编写查询或定义一个结构，该结构将相应地从您的十个 table 中导入所有数据

请参阅我的两个示例 tables

 <entity name="type_masters" pk="type_id" query="SELECT delete_status as   
 type_masters_delete_status,type_updated,type_id,category_id,type_name FROM   
 type_masters
where type_id='${businessmasters.Business_Type}'"
deltaQuery="select type_id from type_masters where type_updated > 
'${dih.last_index_time}'"
parentDeltaQuery="select business_id from businessmasters where 
Business_Type=${type_masters.type_id}"> 
 <field column="type_id" name="id"/>   
 <field column="category_id" name="category_id" indexed="true" stored="true"   
/>
  <field column="type_name" name="type_name" indexed="true" stored="true" />

       <field column="type_updated" name="type_updated" indexed="true" 
stored="true" />
<field column="type_masters_delete_status" name="type_masters_delete_status" 
indexed="true" stored="true" />


<entity name="category_masters" query="SELECT delete_status as 
category_masters_delete_status,category_updated,category_id,category_name 
FROM category_masters where category_id='${type_masters.category_id}'"

   deltaQuery="select category_id from category_masters where category_updated > '${dih.last_index_time}'"

  parentDeltaQuery="select type_id from type_masters where 
  category_id=${category_masters.category_id}"> 

   <field column="category_id" name="id"/>   

  <field column="category_name" name="category_name" indexed="true"    
    stored="true" />
    <field column="category_updated" name="category_updated" indexed="true" 
   stored="true" />
             <field column="category_masters_delete_status" 
     name="category_masters_delete_status" indexed="true" stored="true" />
           </entity><!-- category_masters -->

      </entity><!-- type_masters -->

Answer 2

在MySQL数据库中使用JOIN查询导入数据并导入

是的，这在使用 DIH 的 solr 中是可以实现的。与 DIH 一样，您必须配置您的数据-config.xml。在这里您可以使用连接编写查询从所有需要的 table 中获取数据。这里可以创建一个单核，可以把所有的数据都放在单核里。您可以使用这些字段创建文档。（文档字段将在 schema.xml 中提及）。

这里要考虑的优化点是您要搜索的所有字段以及要在结果中显示的内容。所以你需要先解决这个问题。您将在哪些字段上搜索并需要显示。

您需要搜索的字段将它们设为索引="true"。其余所有 make as indexed="false"。结果中需要的字段将它们标记为 stored="true"。其余全部按存储方式制作="false".

有些可能同时需要，例如搜索和在结果中显示。将它们标记为 indexed="true" 和 stored="true".

例如，我的文档中有 15 个字段，但只有 4 个被编入索引，因为我只想搜索这些字段。其余所有字段都显示在结果中，因此已存储。

现在来回答你的第二个问题

通过单独导入完整数据 table 加入 solr 核心。是的，这在 solr 中是可能的，因为 solr 4.0

查看下面的详细示例 link https://wiki.apache.org/solr/Join

但也要考虑它的局限性
正在加入的文档的字段或其他属性 "from" 不可用于处理 "to" 文档的结果集（即：您不能将 "from" 文档中的 return 字段视为 "to" 文档中的多值字段。

所以你可以在最后决定之前考虑这些要点。

这里考虑你有两个核心

core brands with fields {id,name}
core products with fields{id, name, brand_id}

data in core BRANDS: {1, Apple}, {2, Samsung}, {3, HTC}

data in core PRODUCTS: {1, iPhone, 1}, {2, iPad, 1}, {3, Galaxy S3, 2}, {4, Galaxy Note, 2}, {5, One X, 3}

您可以像这样构建查询：

http://example.com:8999/solr/brands/select?q=*:*&fq={!join from=brand_id to=id fromIndex=products}name:iPad

and the Result will be: {id: "1", name:"Apple"}

在 DistributedSearch 环境中，您不能在多个节点上跨核心加入。但是，如果您有自定义分片方法，则可以在同一节点上跨核心加入。
Join 查询为所有匹配的文档生成常量分数 -- "from" 文档的嵌套查询计算的分数不可用于对 "to" 文档进行评分。

考虑到以上几点，希望您能决定采用哪种方法。

使用 Solr 5.1.0 处理 MySQL 的多个表？

Handle multiple tables of MySQL in with Solr 5.1.0?

php

java

schema

solrcloud

solr5