在 AWS 上监控堆栈的最佳方法是什么?

What is the best way to monitor stacks on AWS?

我目前正在使用 Laravel/PHP 处理我使用 AWS 管理的项目。

感谢 CloudFormation 服务,我部署了多个实例。所以我有很多 LogsGroups。

如果我在这些 LogsGroups 中有错误,我必须手动搜索错误日志。

所以,我的需求是:

当我的某个 LogsGroup 中出现错误 API 或 PH 消息时,如下所示:

[2021-11-24T13:03:48.075879+00:00] technical.ERROR: TYPE: Trzproject\Trzutils\Exceptions\InvalidJWTException   
MESSAGE: Provided JWT has since expired, as defined by the "exp" claim   
FILE: /var/task/vendor/trzproject/trzcore/src/Trzutils/JWT/JWTService.php   LINE: 64   TRACE:  stack trace disabled ____________________________________________________________________

我想收到一条消息(slack、邮件等)提醒我哪个 LogGroup 是错误的。

我无法创建 LogInsight 查询,因为我有很多客户端,而且 LogInsight 不允许对大量 LogsGroup 进行查询。

提前感谢您的建议。

编辑:像这样; https://theithollow.com/2017/12/11/use-amazon-cloudwatch-logs-metric-filters-send-alerts/

但没有为 x 个日志组创建 x 个警报

(对不起我的英语)

我终于做到了,这是满足我需求的解决方案

我创建了一个由 CloudWatch Logs 触发的 Lambda。我将要监控的日志组与 CloudWatch Logs 以及仅检索特定消息的模式相结合。

serverless.yml

functions:
  persistLogMessage:
    handler: lambda/PersistLogMessage.php
    timeout: 899 # 14min 59s
    events:
      - cloudwatchLog:
          logGroup: 'MyLogsGroup01'
          filter: '?ERROR ?WARN ?5xx'
      - cloudwatchLog:
          logGroup: 'MyLogsGroup02'
          filter: '?ERROR ?WARN ?5xx'
      ...
    layers:
      - arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:layer:php-73:1
    role: PersistLogMessageRole
    ...
    resources:
    Conditions:
      Resources:
    #
    PersistLogMessageRole:
      Type: AWS::IAM::Role
      Properties:
        RoleName: ${opt:stage}-${opt:client}-PersistLogMessageRole
        AssumeRolePolicyDocument:
          Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Principal:
                Service:
                  - lambda.amazonaws.com
              Action: sts:AssumeRole
        Policies:
          - PolicyName: PersistLogMessagePolicy
            PolicyDocument:
              Version: '2012-10-17'
              Statement:
                - Effect: "Allow"
                  Action:
                    - logs:CreateLogGroup
                    - logs:CreateLogStream
                    - logs:PutLogEvents
                    - logs:PutRetentionPolicy
                    - logs:DescribeLogStreams
                    - logs:DescribeLogGroups
                  Resource: "*"
                - Effect: Allow
                  Action:
                    - sns:Publish
                  Resource:
                    - "arn:aws:sns:eu-west-3:<account_id>:SnsTopic"

我的 Lambda:

public function __invoke(array $events): void
{
    $data = $events['awslogs']['data'];
    $this->logger->info('events', $events);
    $dataDecoded = base64_decode($data);
    $logMessage = zlib_decode($dataDecoded);
    /** @var stdClass $stdClassLogMessage */
    $stdClassLogMessage = json_decode($logMessage);
    dump($stdClassLogMessage);

    $params = [
        'Message' => $stdClassLogMessage->logEvents[0]->message,
        'TopicArn' => 'arn:aws:sns:eu-west-3:<account_id>:SnsTopic'
    ];

    $this->snsClient->publish($params);
}