telegraf - exec 插件 - aws ec2 ebs 卷信息 - 度量分析错误,原因:[缺少字段] 或遇到错误:[无效数字]

telegraf - exec plugin - aws ec2 ebs volumen info - metric parsing error, reason: [missing fields] or Errors encountered: [ invalid number]

机器 - CentOS 7.2Ubuntu 14.04/16.xx

Telegraf 版本:1.0.1

Python版本:2.7.5

Telegraf 支持名为:exec 的 INPUT 插件。首先请参阅 README 文档中的 EXAMPLE 2。我不能使用 JSON 格式,因为它只使用数值作为指标。根据文档:

If using JSON, only numeric values are parsed and turned into floats. Booleans and strings will be ignored.

所以,这个想法很简单,你在 exec 插件部分指定一个脚本,它应该吐出一些有意义的信息(在 JSON - 或 - influx 数据格式 在我的例子中 因为我有一些包含非数字值的指标)你会想 catch/show 在一个很酷的地方仪表板,例如此处显示的 Wavefront Dashboard:

基本上可以使用这些指标、标签、这些指标的来源来查找有关内存、cpu、磁盘、网络和其他有意义信息的各种信息,还可以使用这些信息创建警报,如果不想要的事情发生了。

好的,我想出了这个 python 脚本,可以在这里找到:

#!/usr/bin/python

# sudo pip install boto3 if you don't have it on your machine.
import boto3


def generate(key, value):
    """
    Creates a nicely formatted Key(Value) item for output
    """
    return '{}="{}"'.format(key, value)
    #return '{}={}'.format(key, value)


def main():
    ec2 = boto3.resource('ec2', region_name="us-west-2")
    volumes = ec2.volumes.all()

    for vol in volumes:
        # You don't need to wrap everything in `str` unless it is not a string
        # By default most things will come back as a string 
        # unless they are very obviously not (complex, date time, etc)
        # but since we are printing these (and formatting them into strings)
        # the cast to string will be implicit and we don't need to make it 
        # explicit


        # vol is already a fully returned volume you are essentially DOUBLING
        # your API calls when you do this
        #iv = ec2.Volume(vol.id)
        output_parts = [
            # Volume level details
            generate('create_time', vol.create_time),
            generate('availability_zone', vol.availability_zone),
            generate('volume_id', vol.volume_id),
            generate('volume_type', vol.volume_type),
            generate('state', vol.state),
            generate('size', vol.size),
            generate('iops', vol.iops),
            generate('encrypted', vol.encrypted),
            generate('snapshot_id', vol.snapshot_id),
            generate('kms_key_id', vol.kms_key_id),
        ]

        for _ in vol.attachments:
            # Will get any attachments and since it is a list
            # we should write this to handle MULTIPLE attachments
            output_parts.extend([
                generate('InstanceId', _.get('InstanceId')),
                generate('InstanceVolumeState', _.get('State')),
                generate('DeleteOnTermination', _.get('DeleteOnTermination')),
                generate('Device', _.get('Device')),
            ])

        # only process when there are tags to process        
        if vol.tags:
            for _ in vol.tags:
                # Get all of the tags
                output_parts.extend([
                    generate(_.get('Key'), _.get('Value')),
                ])

        # output everything at once.. 
        print ','.join(output_parts)


if __name__ == '__main__':
    main()

此脚本将与 AWS EC2 EBS 卷对话并输出它可以找到的所有值(通常是您在 AWS EC2 EBS 卷控制台中看到的值)并将该信息格式化为有意义的 CSV 格式,我将其重定向到 . .csv 日志文件。 我们不想一直运行 python 脚本(AWS API 限制/成本因素)。

因此,一旦创建了 .csv 文件,我就创建了这个小的 shell 脚本,我将在 Telegraf 的 exec 插件的 部分中进行设置。

Shell script /tmp/aws-vol-info.sh 在 Telegraf exec 插件中设置的是:

#!/bin/bash

cat /tmp/aws-vol-info.csv

使用 exec 插件创建的 Telegraf 配置文件 (/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf):

#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec

[[inputs.exec]]
  commands = ["/tmp/aws-vol-info.sh"]

  ## Timeout for each command to complete.
  timeout = "5s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  name_suffix = "_telegraf_execplugin"

调整.py(生成函数的Python脚本)生成以下三种输出格式(.csv 文件)并希望在我启用配置文件(/etc/telegraf/telegraf.d/catch-aws-ebs-info.conf 之前测试 telegraf 将如何处理这些数据) 并重新启动 telegraf 服务。


格式 1:(用双引号 " 包裹每个值)

create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"

在 telegraf 目录上测试 telegraf 配置时出现以下错误。

命令$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:37:48 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:37:48Z E! Errors encountered: [ metric parsing error, reason: [invalid field format], buffer: [create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"], index: [372]]
[vagrant@myvagrant ~] $

格式 2:(没有任何 " 双引号)

create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app

在测试 Telegraf 的 exec 插件配置时遇到同样的错误

2017/03/10 00:45:01 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:45:01Z E! Errors encountered: [ metric parsing error, reason: [invalid value], buffer: [create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [63]]

格式 3:(此格式没有任何 " 双引号和 space </code> 字符在值中)。将 space 替换为 <code>_ 字符。

create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app

仍然无效,得到同样的错误:

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:50:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:50:30Z E! Errors encountered: [ metric parsing error, reason: [missing fields], buffer: [create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [476]]

格式 4:如果我按照此页面遵循 influx 线协议https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/

awsebs,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1

我收到这个错误:

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 02:34:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T02:34:30Z E! Errors encountered: [ invalid number]

我如何 摆脱这个错误并让 telegraf 与 exec 插件一起工作(运行s .sh 脚本)?


其他信息

Python 脚本将 运行 once/twice 每天(通过 cron)并且 telegraf 将 运行 每 1 分钟(到 运行 exec 插件 - 这运行s .sh 脚本 - 将捕获 .csv 文件,以便 telegraf 可以使用 influx 数据格式)。

https://galaxy.ansible.com/wavefrontHQ/wavefront-ansible/

https://github.com/influxdata/telegraf/issues/2525

看来规矩很严格,我应该仔细看看才对

您可以使用的任何程序的输出语法必须匹配或遵循下面显示的 INFLUX LINE PROTOCOL 格式以及所有 RULES 随附。

例如:

weather,location=us-midwest temperature=82 1465839830100400200
  |    -------------------- --------------  |
  |             |             |             |
  |             |             |             |
+-----------+--------+-+---------+-+---------+
|measurement|,tag_set| |field_set| |timestamp|
+-----------+--------+-+---------+-+---------+

您可以在此处阅读有关什么是测量、标签、字段和可选(时间戳)的更多信息:https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/

重要的规则是:

1) measurement 和 tag set 之间必须有 , 而没有 </code> space.</p> <p>2)标签集和字段集之间必须有<code>space

3) 对于标签键、标签值和字段键,如果要转义测量名称、标签或字段集名称及其值中的任何字符,请始终使用反斜杠字符 \ 进行转义!

4) 你无法用 \

转义 \

5) Line Protocol 可以毫无问题地处理表情符号:)

6) OPTIONAL

中的标签/标签集(标签逗号分隔)

7) FIELD / FIELD 集(字段,逗号分隔)- 每行至少需要一个

8) TIMESTAMP(格式中显示的最后一个值)是 OPTIONAL.



9) 非常重要的引用 规则如下:

a) 从不 双引号或单引号 时间戳。它不是有效的线路协议。如果 # 有效,“123123131312313”或“1231313213131”将不起作用。

b) 从不 单引号 字段值(即使它们是字符串! ).它也不是有效的线路协议。即 fieldname='giga' 将不起作用。

c) 不要 双引号或单引号 测量 名称,标签键标签值字段键注意:这确实说明了!!!标签值!!!!这么小心。

d) 不要 双引号 字段值 仅在浮点数、整数中, 或布尔值格式,否则 InfluxDB 将假定这些值是字符串。

e) 执行双引号 字段值,即字符串.

f) 和 最重要的 一个(这将使您免于 秃头 ): 如果设置的 FIELD 值没有双引号 / i.e.你认为它是一个整数值或 float 在一行中(例如:任何人都会说字段 sizeiops)并且在其他一些行(telegraf 将 read/parse 使用 exec 插件 的文件中的任何位置)如果您设置了 非整数 值( 即字符串),然后您将收到以下错误消息遇到错误:[无效数字错误。

所以要修复它,RULE 是,如果 FIELD 键的任何可能的 FIELD 值 string,那么你 必须 确保使用 " 来包装它(在每一行中),它​​是否具有值 [=110] 并不重要=]1、200 或 1.5 在某些行中(例如:iops 可以是 15),在其他一些行中该值 (iops 可以是 None).

错误信息: Errors encountered: [ invalid number

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 11:13:18 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T11:13:18Z E! Errors encountered: [ invalid number metric parsing error, reason: [invalid field format], buffer: [awsebsvol,host=myvagrant ], index: [25]]

所以,在所有这些学习之后,很明显,首先我错过了 Influx Line 协议格式,还有 RULES!!

现在,我希望 python 脚本生成的输出应该是这样的(根据 INFLUX LINE PROTOCOL)。您可以只更改 .sh 文件并使用 sed "s/^/awsec2ebs,/" 或也执行 sed "s/^/awsec2ebs,sourcehost=$(hostname) /" (注意:关闭 sed / 字符之前的 space )然后您可以 " 围绕任何键=值对。我确实将 .py 文件更改为不对 sizeiops 字段使用 "

无论如何,如果输出是这样的:

awsec2ebs,volume_id=vol-058e1d47dgh721121 create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"

在上面的最终工作解决方案中,我创建了一个名为 awsec2ebs 的测量,然后在该测量和标签键 volume_id 之间给出了 ,,对于标签值,我没有使用任何'" 引号,然后我给了一个 </code> space 字符(因为我现在只想要一个标签,否则你可以使用更多标签标签集和字段集之间的命令分离方式和遵循规则。</p> <p>最后 运行 <strong>命令</strong>:</p> <p><code>$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec 神似神子!

2017/03/10 03:33:54 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
> awsec2ebs_telegraf_execplugin,volume_id=vol-058e1d47dgh721121,host=myvagrant volume_type="gp2",iops="100",kms_key_id="None",role="app",size="8",encrypted="False",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",Name="[company-2b-app90] secondary",snapshot_id="snap-06h1h1b91bh662avn",DeleteOnTermination="True",mirror="secondary",cluster="company",autoscale="true",high_availability="1",create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",state="in-use",Device="/dev/sda1",hostname="company-2b-app90-i-0jjb1boop26f42f50" 1489116835000000000
[vagrant@myvagrant ~] $ echo $?
0

在上面的例子中,size 是唯一一个永远是 number/numeric 值的字段,所以我们不需要用 " 包装它,但它取决于你。回忆上面最重要的规则..和它产生的错误。

所以最终的python文件是:

#!/usr/bin/python

#Do `sudo pip install boto3` first
import boto3

def generate(key, value, qs, qe):
    """
    Creates a nicely formatted Key(Value) item for output
    """
    return '{}={}{}{}'.format(key, qs, value, qe)

def main():
    ec2 = boto3.resource('ec2', region_name="us-west-2")
    volumes = ec2.volumes.all()

    for vol in volumes:
        # You don't need to wrap everything in `str` unless it is not a string
        # By default most things will come back as a string
        # unless they are very obviously not (complex, date time, etc)
        # but since we are printing these (and formatting them into strings)
        # the cast to string will be implicit and we don't need to make it
        # explicit

        # vol is already a fully returned volume you are essentially DOUBLING
        # your API calls when you do this
        #iv = ec2.Volume(vol.id)
        output_parts = [
            # Volume level details
            generate('volume_id', vol.volume_id, '"', '"'),
            generate('create_time', vol.create_time, '"', '"'),
            generate('availability_zone', vol.availability_zone, '"', '"'),
            generate('volume_type', vol.volume_type, '"', '"'),
            generate('state', vol.state, '"', '"'),
            generate('size', vol.size, '', ''),
            #The following vol.iops variable can be a number or None so you must wrap it with double quotes otherwise "invalid number" error will come.
            generate('iops', vol.iops, '"', '"'),
            generate('encrypted', vol.encrypted, '"', '"'),
            generate('snapshot_id', vol.snapshot_id, '"', '"'),
            generate('kms_key_id', vol.kms_key_id, '"', '"'),
        ]

        for _ in vol.attachments:
            # Will get any attachments and since it is a list
            # we should write this to handle MULTIPLE attachments
            output_parts.extend([
                generate('InstanceId', _.get('InstanceId'), '"', '"'),
                generate('InstanceVolumeState', _.get('State'), '"', '"'),
                generate('DeleteOnTermination', _.get('DeleteOnTermination'), '"', '"'),
                generate('Device', _.get('Device'), '"', '"'),
            ])

        # only process when there are tags to process
        if vol.tags:
            for _ in vol.tags:
                # Get all of the tags
                output_parts.extend([
                    generate(_.get('Key'), _.get('Value'), '"', '"'),
                ])

        # output everything at once..
        print ','.join(output_parts)

if __name__ == '__main__':
    main()

最终 aws-vol-info.sh 是:

#!/bin/bash

cat aws-vol-info.csv | sed "s/^/awsebsvol,host=`hostname|head -1|sed "s/[ \t][ \t]*/_/g"` /"

最终的 telegraf exec 插件配置文件是 (/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf) 用 .conf 给任何名字:

#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec

[[inputs.exec]]
  commands = ["/some/valid/path/where/csvfileexists/aws-vol-info.sh"]

  ## Timeout for each command to complete.
  timeout = "5s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  name_suffix = "_telegraf_exec"

运行: 现在一切正常了!

$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec