Solr 增量导入擦除索引
Solr delta-import erases index
我在从 MySQL 数据库导入 Solr delta 时遇到问题。我能够完全导入没问题。当我尝试执行增量导入时,它会导入更改的记录(如预期的那样),但会清除索引的其余部分,因此只有更新的记录在索引中。日志中没有错误。我的配置中是否缺少某些内容? 运行 Ubuntu 服务器上的 Solr 5.4 并使用管理员 UI。
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/ibnet" user="xxxx" password="xxxxx" />
<document>
<entity name="profile" pk="profile.id" query="
SELECT
profile.id AS id,
profile.profile_status AS profile_status,
//
// Other fields
//
linkedProfile.org_name AS linked_org_name,
linkedProfile.org_city AS linked_org_city,
linkedProfile.org_st_prov_reg AS linked_org_st_prov_reg,
linkedProfile.org_country AS linked_org_country
FROM profile AS profile
LEFT JOIN profile AS linkedProfile ON linkedProfile.id = profile.linked_id"
deltaImportQuery="
SELECT
profile.id AS id,
profile.profile_status AS profile_status,
//
// Other fields
//
linkedProfile.org_name AS linked_org_name,
linkedProfile.org_city AS linked_org_city,
linkedProfile.org_st_prov_reg AS linked_org_st_prov_reg,
linkedProfile.org_country AS linked_org_country
FROM profile AS profile
LEFT JOIN profile AS linkedProfile ON linkedProfile.id = profile.linked_id
WHERE profile.id = '${dih.delta.id}'"
deltaQuery="SELECT profile.id FROM profile WHERE last_modified > '${dih.last_index_time}'"
onError="skip" >
</entity>
</document>
编辑:我已将 dih.delta.id 更改为 dataimporter.delta.id,last_index_time 也是如此,但这并没有改变结果。
这是回复:
{
"responseHeader": {
"status": 0,
"QTime": 0
},
"initArgs": [
"defaults",
[
"config",
"data-config.xml"
]
],
"command": "status",
"status": "idle",
"importResponse": "",
"statusMessages": {
"Total Requests made to DataSource": "4",
"Total Rows Fetched": "6",
"Total Documents Processed": "3",
"Total Documents Skipped": "0",
"Delta Dump started": "2016-05-01 02:38:03",
"Identifying Delta": "2016-05-01 02:38:03",
"Deltas Obtained": "2016-05-01 02:38:03",
"Building documents": "2016-05-01 02:38:03",
"Total Changed Documents": "3",
"": "Indexing completed. Added/Updated: 3 documents. Deleted 0 documents.",
"Committed": "2016-05-01 02:38:03",
"Time taken": "0:0:0.317"
}
}
在 solr admin -> your core -> dataimport 中,有一个 Clean
选项,如果选中则它会在导入之前先清理数据(对于完全导入和增量导入)。
另一个提示是,solr DIH 总是使用 UTC
作为导入时间戳,那么你的时区是什么?先将数据库中的日期时间列转换为 utc,然后再与 dih.last_index_time
.
进行比较
我在从 MySQL 数据库导入 Solr delta 时遇到问题。我能够完全导入没问题。当我尝试执行增量导入时,它会导入更改的记录(如预期的那样),但会清除索引的其余部分,因此只有更新的记录在索引中。日志中没有错误。我的配置中是否缺少某些内容? 运行 Ubuntu 服务器上的 Solr 5.4 并使用管理员 UI。
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/ibnet" user="xxxx" password="xxxxx" />
<document>
<entity name="profile" pk="profile.id" query="
SELECT
profile.id AS id,
profile.profile_status AS profile_status,
//
// Other fields
//
linkedProfile.org_name AS linked_org_name,
linkedProfile.org_city AS linked_org_city,
linkedProfile.org_st_prov_reg AS linked_org_st_prov_reg,
linkedProfile.org_country AS linked_org_country
FROM profile AS profile
LEFT JOIN profile AS linkedProfile ON linkedProfile.id = profile.linked_id"
deltaImportQuery="
SELECT
profile.id AS id,
profile.profile_status AS profile_status,
//
// Other fields
//
linkedProfile.org_name AS linked_org_name,
linkedProfile.org_city AS linked_org_city,
linkedProfile.org_st_prov_reg AS linked_org_st_prov_reg,
linkedProfile.org_country AS linked_org_country
FROM profile AS profile
LEFT JOIN profile AS linkedProfile ON linkedProfile.id = profile.linked_id
WHERE profile.id = '${dih.delta.id}'"
deltaQuery="SELECT profile.id FROM profile WHERE last_modified > '${dih.last_index_time}'"
onError="skip" >
</entity>
</document>
编辑:我已将 dih.delta.id 更改为 dataimporter.delta.id,last_index_time 也是如此,但这并没有改变结果。
这是回复:
{
"responseHeader": {
"status": 0,
"QTime": 0
},
"initArgs": [
"defaults",
[
"config",
"data-config.xml"
]
],
"command": "status",
"status": "idle",
"importResponse": "",
"statusMessages": {
"Total Requests made to DataSource": "4",
"Total Rows Fetched": "6",
"Total Documents Processed": "3",
"Total Documents Skipped": "0",
"Delta Dump started": "2016-05-01 02:38:03",
"Identifying Delta": "2016-05-01 02:38:03",
"Deltas Obtained": "2016-05-01 02:38:03",
"Building documents": "2016-05-01 02:38:03",
"Total Changed Documents": "3",
"": "Indexing completed. Added/Updated: 3 documents. Deleted 0 documents.",
"Committed": "2016-05-01 02:38:03",
"Time taken": "0:0:0.317"
}
}
在 solr admin -> your core -> dataimport 中,有一个 Clean
选项,如果选中则它会在导入之前先清理数据(对于完全导入和增量导入)。
另一个提示是,solr DIH 总是使用 UTC
作为导入时间戳,那么你的时区是什么?先将数据库中的日期时间列转换为 utc,然后再与 dih.last_index_time
.