Datadog 中的 AWS Cloudwatch 警报
AWS Cloudwatch alarms in Datadog
查看 Datadog AWS 集成文档时,我发现提到 AWS 警报可以流式传输到 Datadog 中。在 Alarm collection 部分,您可以选择两种不同的方法将 AWS CloudWatch 警报发送到 Datadog Event Stream right here。
但是没有关于如何做到这一点或应该设置什么来做到这一点的进一步解释。此外,尝试 google 诸如“Datadog aws 警报轮询”之类的内容会给您一些其他功能的模糊描述,但不会对 AWS CloudWatch 警报进行描述。
我的问题是这可能吗?
到目前为止我尝试的是设置 DataDog Lambda 转发器,它将 CloudWatch 日志(我想也是指标和警报?)发送到 DD。我允许那个 lambda。我创建了一些 AWS 指标过滤器和 AWS 警报以在特定事件发生时触发。我 运行 一些 lambda 代码抛出异常并触发 CloudWatch 警报以更改其状态。
我清楚地看到 DD 中的 lambda 日志,但我在 DD 事件中找不到与我的警报相关的任何内容。我想这不是 DD-AWS 集成的问题,因为我们在大型组织中使用它,而且它在我加入公司之前很久就已经配置好了。
我做错了什么?
下面的 Cloudformation 脚本(我删除了一些部分,所以它不能正常工作)
Resources:
DatadogForwarderLambda:
Type: AWS::Lambda::Function
Properties:
Description: Pushes logs, metrics and traces from AWS to Datadog.
Role: !GetAtt "DatadogForwarderLambdaRole.Arn"
Handler: lambda_function.lambda_handler
Code:
S3Bucket: config-sandbox
S3Key: 'aws-dd-forwarder-3.38.0.zip'
MemorySize: 1024
Runtime: python3.7
Timeout: 120
Tags:
- Key: "dd_forwarder_version"
Value: 3.38.0
Environment:
Variables:
DD_ENHANCED_METRICS: "false"
DD_API_KEY_SECRET_ARN:
Ref: DdApiKeySecret
DD_S3_BUCKET_NAME: config-sandbox
DD_SITE: datadoghq.com
DD_: datadoghq.com
DD_TAGS_CACHE_TTL_SECONDS: 300
DD_FETCH_LAMBDA_TAGS: true
DD_USE_TCP: false
DD_NO_SSL: false
REDACT_IP: false
REDACT_EMAIL: false
DD_USE_PRIVATE_LINK: false
DD_USE_VPC: false
ReservedConcurrentExecutions: 100
DatadogReadonlyPolicy:
Type: 'AWS::IAM::Policy'
Properties:
PolicyName: !Sub "DatadogReadonlyPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- 'cloudwatch:Get*'
- 'cloudwatch:List*'
- 'cloudwatch:DescribeAlarmHistory'
- 'cloudtrail:LookupEvents'
- 'ec2:Describe'
- 's3:GetObject'
- 's3:PutObject'
- 's3:DeleteObject'
- 's3:ListBucket'
- 'lambda:List*'
- 'tag:GetResources'
- 'tag:GetTagKeys'
- 'tag:GetTagValues'
- 'support:*'
Resource: !GetAtt DatadogForwarderLambda.Arn
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- Ref: DdApiKeySecret
Roles:
- !Ref DatadogForwarderLambdaRole
DatadogForwarderLambdaRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
AWS:
- Fn::Sub:
- "arn:aws:iam::${AccountId}:role/human-role/some-role-name"
- { AccountId: !Ref 'AWS::AccountId' }
Action:
- sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
- arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
Path: /
PermissionsBoundary:
Fn::Join:
- ''
- - 'arn:aws:iam::'
- Ref: AWS::AccountId
- ':policy/some-organisation-permission-boundary'
RoleName:
Fn::Sub:
- 'a${AIID}-dd-forwarder-lambda-${StackID}'
- { StackID: !Select [4, !Split ["-", !Ref 'AWS::StackId']],
AIID: !Ref AIID }
IncomingQueueHasMessagesExceptionAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: Incoming queue has unprocessed messages, new processing round can't be started
AlarmName: !Sub "IncomingQueueHasMessagesExceptionAlarm"
ComparisonOperator: GreaterThanThreshold
Threshold: 0 # no messages are allowed in queue if new round started
EvaluationPeriods: 1
Period: 10
Namespace: dev-logs
MetricName: QueueHasMessagesException
Statistic: Sum
TreatMissingData: missing
IncomingQueueHasMessagesExceptionMetricFilter:
Type: AWS::Logs::MetricFilter
Properties:
LogGroupName:
!Sub '/aws/lambda/${SomeLambdaName}'
FilterPattern: "QueueHasMessagesException"
MetricTransformations:
-
MetricNamespace: dev-logs
MetricName: QueueHasMessagesException
MetricValue: 1
最后我发现我的AWS账户并没有完全集成到DD中。
查看 Datadog AWS 集成文档时,我发现提到 AWS 警报可以流式传输到 Datadog 中。在 Alarm collection 部分,您可以选择两种不同的方法将 AWS CloudWatch 警报发送到 Datadog Event Stream right here。 但是没有关于如何做到这一点或应该设置什么来做到这一点的进一步解释。此外,尝试 google 诸如“Datadog aws 警报轮询”之类的内容会给您一些其他功能的模糊描述,但不会对 AWS CloudWatch 警报进行描述。
我的问题是这可能吗?
到目前为止我尝试的是设置 DataDog Lambda 转发器,它将 CloudWatch 日志(我想也是指标和警报?)发送到 DD。我允许那个 lambda。我创建了一些 AWS 指标过滤器和 AWS 警报以在特定事件发生时触发。我 运行 一些 lambda 代码抛出异常并触发 CloudWatch 警报以更改其状态。
我清楚地看到 DD 中的 lambda 日志,但我在 DD 事件中找不到与我的警报相关的任何内容。我想这不是 DD-AWS 集成的问题,因为我们在大型组织中使用它,而且它在我加入公司之前很久就已经配置好了。 我做错了什么?
下面的 Cloudformation 脚本(我删除了一些部分,所以它不能正常工作)
Resources:
DatadogForwarderLambda:
Type: AWS::Lambda::Function
Properties:
Description: Pushes logs, metrics and traces from AWS to Datadog.
Role: !GetAtt "DatadogForwarderLambdaRole.Arn"
Handler: lambda_function.lambda_handler
Code:
S3Bucket: config-sandbox
S3Key: 'aws-dd-forwarder-3.38.0.zip'
MemorySize: 1024
Runtime: python3.7
Timeout: 120
Tags:
- Key: "dd_forwarder_version"
Value: 3.38.0
Environment:
Variables:
DD_ENHANCED_METRICS: "false"
DD_API_KEY_SECRET_ARN:
Ref: DdApiKeySecret
DD_S3_BUCKET_NAME: config-sandbox
DD_SITE: datadoghq.com
DD_: datadoghq.com
DD_TAGS_CACHE_TTL_SECONDS: 300
DD_FETCH_LAMBDA_TAGS: true
DD_USE_TCP: false
DD_NO_SSL: false
REDACT_IP: false
REDACT_EMAIL: false
DD_USE_PRIVATE_LINK: false
DD_USE_VPC: false
ReservedConcurrentExecutions: 100
DatadogReadonlyPolicy:
Type: 'AWS::IAM::Policy'
Properties:
PolicyName: !Sub "DatadogReadonlyPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- 'cloudwatch:Get*'
- 'cloudwatch:List*'
- 'cloudwatch:DescribeAlarmHistory'
- 'cloudtrail:LookupEvents'
- 'ec2:Describe'
- 's3:GetObject'
- 's3:PutObject'
- 's3:DeleteObject'
- 's3:ListBucket'
- 'lambda:List*'
- 'tag:GetResources'
- 'tag:GetTagKeys'
- 'tag:GetTagValues'
- 'support:*'
Resource: !GetAtt DatadogForwarderLambda.Arn
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- Ref: DdApiKeySecret
Roles:
- !Ref DatadogForwarderLambdaRole
DatadogForwarderLambdaRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
AWS:
- Fn::Sub:
- "arn:aws:iam::${AccountId}:role/human-role/some-role-name"
- { AccountId: !Ref 'AWS::AccountId' }
Action:
- sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
- arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
Path: /
PermissionsBoundary:
Fn::Join:
- ''
- - 'arn:aws:iam::'
- Ref: AWS::AccountId
- ':policy/some-organisation-permission-boundary'
RoleName:
Fn::Sub:
- 'a${AIID}-dd-forwarder-lambda-${StackID}'
- { StackID: !Select [4, !Split ["-", !Ref 'AWS::StackId']],
AIID: !Ref AIID }
IncomingQueueHasMessagesExceptionAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: Incoming queue has unprocessed messages, new processing round can't be started
AlarmName: !Sub "IncomingQueueHasMessagesExceptionAlarm"
ComparisonOperator: GreaterThanThreshold
Threshold: 0 # no messages are allowed in queue if new round started
EvaluationPeriods: 1
Period: 10
Namespace: dev-logs
MetricName: QueueHasMessagesException
Statistic: Sum
TreatMissingData: missing
IncomingQueueHasMessagesExceptionMetricFilter:
Type: AWS::Logs::MetricFilter
Properties:
LogGroupName:
!Sub '/aws/lambda/${SomeLambdaName}'
FilterPattern: "QueueHasMessagesException"
MetricTransformations:
-
MetricNamespace: dev-logs
MetricName: QueueHasMessagesException
MetricValue: 1
最后我发现我的AWS账户并没有完全集成到DD中。