SQL 服务器数据库 > Logstash > Elasticsearch: 将与同一实体相关的结果集记录映射到同一 ES 文档

Question

我有问答实体在 SQL 服务器数据库中表示为 2 tables Questions 和 Answers（见下文）。他们之间的关系是OneToMany。

Questions table

Id      Title
-------------------
 1      Question 1
 2      Question 2

Answers table:

Id    Answer        Question_Id
-------------------------------
1     answer 1      1
2     answer 2      1
3     answer 3      1
4     answer 4      2
5     answer 5      2

在通过 Logstash 管道移动数据后，我想获得具有以下结构的 ES 文档：

{
  “questionId": 1,
  "questionTitle": "Question 1",
  "questionAnswers": [
    {
      “answerId": 1,
      "answer": "answer 1"
    },
    {
      "answerId": 2,
      "answer": "answer 2"
    },
    {
      "answerId": 3,
      "answer": "answer 3"
    }
  ]
}

{
  "questionId": 2,
  "questionTitle": "Question 2",
  "questionAnswers": [
    {
      "answerId": 4,
      "answer": "answer 4"
    },
    {
      "answerId": 5,
      "answer": "answer 5"
    }
  ]
}

logstash jdbc 输入插件设置使用 Question_Answers 视图来检索数据。

{
  jdbc {
    type => “Test_1”
    jdbc_connection_string => "jdbc:sqlserver://myinstance:1433"
    jdbc_user => “root”
    jdbc_password => “root”
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_driver_library => "/home/abury/enu/mssql-jdbc-6.2.2.jre8.jar"
    schedule => "*/3 * * * *"
    statement => "SELECT * from Question_Answers"
  }
}

视图 return 编辑的结果集如下所示：

questionId  questionTitle   answerId    answer
1           Question 1      1           answer 1
1           Question 1      2           answer 2
1           Question 1      3           answer 3
2           Question 2      4           answer 4
2           Question 2      5           answer 5

Elasticsearch 输出插件设置如下所示：

output {
    elasticsearch {
    hosts => "http://localhost:9200"
    index => "question"
    document_id => "%{questionId}"
  }
}

问题：如何设置 Logstash 来识别与同一问题相关的记录并构建具有上面提供的所需结构的 ES 文档？是否可以将一些聚合逻辑添加到 output.conf 文件中以实现所需的行为？或者我需要将我的数据库视图重新写入 return 每个问题的单个记录：

questionId  questionTitle   answerId    answer
---------------------------------------------------------------------
1           Question 1      1, 2, 3     answer 1, answer 2, answer 3

已更新：修复列名称中的拼写错误

Answer 1

SELECT 
questionId,
questionTitle,
GROUP_CONCAT(answereId) answerIDs,
GROUP_CONCAT(answer) answers
FROM Question_Answers
GROUP BY questionId, questionTitle

顺便说一下，您的列名称中有错字 answereId 我猜您想要 answerId。

Answer 2

我能够通过使用 logstash 聚合过滤器插件（参见，Example 4）获得所需的 Elasticsearch 文档结构：

filter {
    aggregate {
        task_id => "%{questionId}"
        code => "
               map['questionId'] ||= event.get('questionid')
               map['questionTitle'] ||= event.get('questiontitle')
               
               map['questionAnswers'] ||= []
               map['questionAnswers'] << {'answerId' => event.get('answerid'), 'answer' => event.get('answer')}

               event.cancel()
             "
        push_previous_map_as_event => true
        timeout => 3
    }
}

SQL 服务器数据库 > Logstash > Elasticsearch: 将与同一实体相关的结果集记录映射到同一 ES 文档

SQL Server database > Logstash > Elasticsearch: map result set records related to the same entity to same ES document

elasticsearch

logstash

logstash-configuration

logstash-jdbc