Elasticsearch:从索引文档中删除重复记录

Elasticsearch: Remove Duplicate Records from Index Documents

这是我的 JDBC 河流命令,用于从数据库中获取所有记录。

localhost:9200/_river/my_update_river/_meta
{
  "type" : "jdbc",
   "jdbc" : {
     "url" : "jdbc:mysql://localhost:3306/admin",
      "user" : "root",
      "password" : "",
      "poll" : "6s",
      "index" : "updateauto",
      "type" : "users",
      "schedule":"0/10 * * ? * *",
      "strategy" : "simple",
      "sql" : "select * from users"
    }
 }

当我运行这个命令时:我有两个问题:

  1. 重复记录
  2. 当我在数据库中添加新记录时,它不会更新索引文档,而是通过

    进行搜索

    { "query":{ "filtered":{ "filter":{ "term":{"Name":"testing"} } } } }

这是我的结果。

   {
     "took" : 4,
     "timed_out" : false,
      "_shards" : {
      "total" : 5,
      "successful" : 5,
      "failed" : 0
   },
     "hits" : {
     "total" : 37551,
      "max_score" : 1.0,
      "hits" : [ {
      "_index" : "updateauto",
      "_type" : "users",
      "_id" : "AUvjnNHmMKBTPrby96Jg",
      "_score" : 1.0,
      "_source":{"ID":23,"Name":"Abudul  Rafay","Email":"a","Password":"afasd"}
}, {
      "_index" : "updateauto",
     "_type" : "users",
     "_id" : "AUvjnNHnMKBTPrby96Jk",
    "_score" : 1.0,
     "_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
}, {
      "_index" : "updateauto",
      "_type" : "users",
       "_id" : "AUvjngk0MKBTPrby96Ka",
      "_score" : 1.0,
      "_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
     "_index" : "updateauto",
     "_type" : "users",
     "_id" : "AUvjngk0MKBTPrby96Kf",
     " _score" : 1.0,
     "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
      "_index" : "updateauto",
      "_type" : "users",
     "_id" : "AUvjnjA0MKBTPrby96Kh",
     "_score" : 1.0,
     "_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
     "_index" : "updateauto",
      "_type" : "users",
    "_id" : "AUvjnjA0MKBTPrby96Km",
    "_score" : 1.0,
    "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
},  {
    "_index" : "updateauto",
    "_type" : "users",
    "_id" : "AUvjnZP0MKBTPrby96KD",
    "_score" : 1.0,
    "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
    "_index" : "updateauto",
    "_type" : "users",
    "_id" : "AUvjnPe-MKBTPrby96Jq",
   "_score" : 1.0,
    "_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
}, {
    "_index" : "updateauto",
    "_type" : "users",
   "_id" : "AUvjnR7NMKBTPrby96Ju",
    "_score" : 1.0,
    "_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
}, {
    "_index" : "updateauto",
    "_type" : "users",
    "_id" : "AUvjnbuLMKBTPrby96KO",
    "_score" : 1.0,
    "_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
    } ]
   }
 }

我想要没有重复记录并且自动更新的结果。

我不太明白你的第二个问题,但考虑到这里的重复问题是你需要做的:

您需要在河流定义中指定文档的 ID,如下所示:

"sql" : "select *, ID as _id from user"

这样,river 将只写入承认其 ID 的每个用户。