Elasticsearch:从索引文档中删除重复记录
Elasticsearch: Remove Duplicate Records from Index Documents
这是我的 JDBC 河流命令,用于从数据库中获取所有记录。
localhost:9200/_river/my_update_river/_meta
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/admin",
"user" : "root",
"password" : "",
"poll" : "6s",
"index" : "updateauto",
"type" : "users",
"schedule":"0/10 * * ? * *",
"strategy" : "simple",
"sql" : "select * from users"
}
}
当我运行这个命令时:我有两个问题:
- 重复记录
当我在数据库中添加新记录时,它不会更新索引文档,而是通过
进行搜索
{
"query":{
"filtered":{
"filter":{
"term":{"Name":"testing"}
}
}
}
}
这是我的结果。
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 37551,
"max_score" : 1.0,
"hits" : [ {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnNHmMKBTPrby96Jg",
"_score" : 1.0,
"_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnNHnMKBTPrby96Jk",
"_score" : 1.0,
"_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjngk0MKBTPrby96Ka",
"_score" : 1.0,
"_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjngk0MKBTPrby96Kf",
" _score" : 1.0,
"_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnjA0MKBTPrby96Kh",
"_score" : 1.0,
"_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnjA0MKBTPrby96Km",
"_score" : 1.0,
"_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnZP0MKBTPrby96KD",
"_score" : 1.0,
"_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnPe-MKBTPrby96Jq",
"_score" : 1.0,
"_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnR7NMKBTPrby96Ju",
"_score" : 1.0,
"_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnbuLMKBTPrby96KO",
"_score" : 1.0,
"_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
} ]
}
}
我想要没有重复记录并且自动更新的结果。
我不太明白你的第二个问题,但考虑到这里的重复问题是你需要做的:
您需要在河流定义中指定文档的 ID,如下所示:
"sql" : "select *, ID as _id from user"
这样,river 将只写入承认其 ID 的每个用户。
这是我的 JDBC 河流命令,用于从数据库中获取所有记录。
localhost:9200/_river/my_update_river/_meta
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/admin",
"user" : "root",
"password" : "",
"poll" : "6s",
"index" : "updateauto",
"type" : "users",
"schedule":"0/10 * * ? * *",
"strategy" : "simple",
"sql" : "select * from users"
}
}
当我运行这个命令时:我有两个问题:
- 重复记录
当我在数据库中添加新记录时,它不会更新索引文档,而是通过
进行搜索{ "query":{ "filtered":{ "filter":{ "term":{"Name":"testing"} } } } }
这是我的结果。
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 37551,
"max_score" : 1.0,
"hits" : [ {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnNHmMKBTPrby96Jg",
"_score" : 1.0,
"_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnNHnMKBTPrby96Jk",
"_score" : 1.0,
"_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjngk0MKBTPrby96Ka",
"_score" : 1.0,
"_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjngk0MKBTPrby96Kf",
" _score" : 1.0,
"_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnjA0MKBTPrby96Kh",
"_score" : 1.0,
"_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnjA0MKBTPrby96Km",
"_score" : 1.0,
"_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnZP0MKBTPrby96KD",
"_score" : 1.0,
"_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnPe-MKBTPrby96Jq",
"_score" : 1.0,
"_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnR7NMKBTPrby96Ju",
"_score" : 1.0,
"_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
}, {
"_index" : "updateauto",
"_type" : "users",
"_id" : "AUvjnbuLMKBTPrby96KO",
"_score" : 1.0,
"_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
} ]
}
}
我想要没有重复记录并且自动更新的结果。
我不太明白你的第二个问题,但考虑到这里的重复问题是你需要做的:
您需要在河流定义中指定文档的 ID,如下所示:
"sql" : "select *, ID as _id from user"
这样,river 将只写入承认其 ID 的每个用户。