弹性搜索、Logstash:document_id 字符串未得到评估
Elastic search, Logstash: document_id string does not get evaluated
为了在从 logstash 注入数据时防止数据重复,我为带有 peopleRowId 列的 logstash conf 添加了一个 document_id 字符串。但是它没有得到评估。所以在我的例子中,我试图将文档 id 设置为 document_id => "%{[document][projectsRowId]}"
,但是由于某种原因,这并没有得到评估,并且在弹性搜索中的 id 为 我添加了 ROW_NUMBER() OVER (
按 a.created_at 排序
) 作为 projectsRowId 创建唯一的 id
[
{
"_index" : "projectsv3",
"_type" : "_doc",
"_id" : "%{[document][projectsRowId]}",
"_score" : 1.0,
"_source" : {...single record}
]
我不确定为什么没有启用文档 ID。使用弹性搜索 7 和 ECS 也被禁用。我也尝试过其他方法,例如带指纹的过滤器我也尝试将文档 ID 设置为 document_id => "%{projectsRowId}"
,尽管在所有情况下都不会对其进行评估
input {
jdbc {
jdbc_driver_library => "C:\ElasticStack\mysql-connector-java-8.0.24\mysql-connector-java-8.0.24.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# mysql jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/corrabla_sercweb"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => "root"
schedule => "* * * * *"
clean_run => true
# use_column_value => true
# tracking_column => "%{[@metadata][fingerprint]}"
# tracking_column_type => "numeric"
# our query to fetch people details
statement => "select ROW_NUMBER() OVER (
ORDER BY a.created_at
) as projectsRowId , (a.created_at), tr.report_number as 'tech_report_number', tr.file_s3 as 'tech_report_file_name', tr.abstract as 'tech_report_abstract' , c.prefix as 'piPrefix' , c.first_name as 'piFirstName', c.middle_name as 'piMiddleName' ,c.last_name as 'piLastName', b.person_id, d.prefix as 'coPiPrefix' "
# use_column_value => true
# tracking_column => id
# tracking_column_type => "numeric"
}
}
output {
elasticsearch {
action => "create"
hosts => "http://127.0.0.1:9200"
index => "projectsv3"
doc_as_upsert => true
document_id => "%{[document][projectsRowId]}"
}
}
默认情况下,jdbc 输入会将字段名称折叠为小写,因此您的活动将有一个名为 projectsrowid 的字段,而不是 projectsRowId。如果您在输入上设置 lowercase_column_names => false
,则 `document_id => "%{[projectsRowId]}" 将起作用。
为了在从 logstash 注入数据时防止数据重复,我为带有 peopleRowId 列的 logstash conf 添加了一个 document_id 字符串。但是它没有得到评估。所以在我的例子中,我试图将文档 id 设置为 document_id => "%{[document][projectsRowId]}"
,但是由于某种原因,这并没有得到评估,并且在弹性搜索中的 id 为 我添加了 ROW_NUMBER() OVER (
按 a.created_at 排序
) 作为 projectsRowId 创建唯一的 id
[
{
"_index" : "projectsv3",
"_type" : "_doc",
"_id" : "%{[document][projectsRowId]}",
"_score" : 1.0,
"_source" : {...single record}
]
我不确定为什么没有启用文档 ID。使用弹性搜索 7 和 ECS 也被禁用。我也尝试过其他方法,例如带指纹的过滤器我也尝试将文档 ID 设置为 document_id => "%{projectsRowId}"
,尽管在所有情况下都不会对其进行评估
input {
jdbc {
jdbc_driver_library => "C:\ElasticStack\mysql-connector-java-8.0.24\mysql-connector-java-8.0.24.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# mysql jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/corrabla_sercweb"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => "root"
schedule => "* * * * *"
clean_run => true
# use_column_value => true
# tracking_column => "%{[@metadata][fingerprint]}"
# tracking_column_type => "numeric"
# our query to fetch people details
statement => "select ROW_NUMBER() OVER (
ORDER BY a.created_at
) as projectsRowId , (a.created_at), tr.report_number as 'tech_report_number', tr.file_s3 as 'tech_report_file_name', tr.abstract as 'tech_report_abstract' , c.prefix as 'piPrefix' , c.first_name as 'piFirstName', c.middle_name as 'piMiddleName' ,c.last_name as 'piLastName', b.person_id, d.prefix as 'coPiPrefix' "
# use_column_value => true
# tracking_column => id
# tracking_column_type => "numeric"
}
}
output {
elasticsearch {
action => "create"
hosts => "http://127.0.0.1:9200"
index => "projectsv3"
doc_as_upsert => true
document_id => "%{[document][projectsRowId]}"
}
}
默认情况下,jdbc 输入会将字段名称折叠为小写,因此您的活动将有一个名为 projectsrowid 的字段,而不是 projectsRowId。如果您在输入上设置 lowercase_column_names => false
,则 `document_id => "%{[projectsRowId]}" 将起作用。