AWS Cloudwatch Log - 是否可以从中导出现有的日志数据?

AWS Cloudwatch Log - Is it possible to export existing log data from it?

我已经成功地使用 AWS CloudWatch 日志代理将我的应用程序日志推送到 AWS Cloudwatch。但是 CloudWatch Web 控制台似乎没有提供按钮让您 download/export 从中获取日志数据。

知道如何实现这个目标吗?

显然,AWS 控制台没有开箱即用的方式供您下载 CloudWatchLogs。或许您可以使用 SDK / API 编写脚本来执行 CloudWatchLogs 提取。

CloudWatchLogs 的好处是您可以无限期地保留日志(永不过期);与仅保留日志 14 天的 CloudWatch 不同。这意味着您可以 运行 以每月/每季度的频率而不是按需使用脚本。

有关 CloudWatchLogs 的更多信息API, http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/Welcome.html http://awsdocs.s3.amazonaws.com/cloudwatchlogs/latest/cwl-api.pdf

最新的 AWS CLI 有一个 CloudWatch Logs cli,它允许您将日志下载为 JSON、文本文件或 AWS CLI 支持的任何其他输出。

例如,要从组 A 中的流 a 中获取前 1MB 最多 10,000 个日志条目到文本文件,运行:

aws logs get-log-events \
   --log-group-name A --log-stream-name a \
   --output text > a.log

该命令目前限制为最大 1MB 的响应大小(每个请求最多 10,000 条记录),如果您有更多,则需要使用 --next-token 参数实现您自己的页面步进机制。我预计将来 CLI 也将允许在单个命令中进行完整转储。

更新

这是一个小 Bash 脚本,用于列出自指定时间以来特定组中所有流的事件:

#!/bin/bash
function dumpstreams() {
  aws $AWSARGS logs describe-log-streams \
    --order-by LastEventTime --log-group-name $LOGGROUP \
    --output text | while read -a st; do 
      [ "${st[4]}" -lt "$starttime" ] && continue
      stname="${st[1]}"
      echo ${stname##*:}
    done | while read stream; do
      aws $AWSARGS logs get-log-events \
        --start-from-head --start-time $starttime \
        --log-group-name $LOGGROUP --log-stream-name $stream --output text
    done
}

AWSARGS="--profile myprofile --region us-east-1"
LOGGROUP="some-log-group"
TAIL=
starttime=$(date --date "-1 week" +%s)000
nexttime=$(date +%s)000
dumpstreams
if [ -n "$TAIL" ]; then
  while true; do
    starttime=$nexttime
    nexttime=$(date +%s)000
    sleep 1
    dumpstreams
  done
fi

最后一部分,如果您设置 TAIL 将继续获取日志事件,并在新事件进入时报告它们(有一些预期的延迟)。

还有一个名为 awslogs 的 python 项目,允许获取日志:https://github.com/jorgebastida/awslogs

有这样的东西:

列出日志组:

$ awslogs groups

列出给定日志组的流:

$ awslogs streams /var/log/syslog

从所有流中获取日志记录:

$ awslogs get /var/log/syslog

从特定流中获取日志记录:

$ awslogs get /var/log/syslog stream_A

还有更多(过滤时间段、查看日志流...

我想,这个工具可能会帮助你做你想做的事。

AWS 似乎添加了将整个日志组导出到 S3 的功能。

您需要在 S3 存储桶上设置权限以允许 cloudwatch 写入存储桶,方法是将以下内容添加到您的存储桶策略中,将区域替换为您所在的区域,将存储桶名称替换为您的存储桶名称。

    {
        "Effect": "Allow",
        "Principal": {
            "Service": "logs.us-east-1.amazonaws.com"
        },
        "Action": "s3:GetBucketAcl",
        "Resource": "arn:aws:s3:::tsf-log-data"
    },
    {
        "Effect": "Allow",
        "Principal": {
            "Service": "logs.us-east-1.amazonaws.com"
        },
        "Action": "s3:PutObject",
        "Resource": "arn:aws:s3:::tsf-log-data/*",
        "Condition": {
            "StringEquals": {
                "s3:x-amz-acl": "bucket-owner-full-control"
            }
        }
    }

详情见Step 2 of this AWS doc

我会添加一个衬里来获取流的所有日志:

aws logs get-log-events --log-group-name my-log-group --log-stream-name my-log-stream | grep '"message":' | awk -F '"' '{ print $(NF-1) }' > my-log-group_my-log-stream.txt

或者更易读的格式:

aws logs get-log-events \
    --log-group-name my-log-group\
    --log-stream-name my-log-stream \
    | grep '"message":' \
    | awk -F '"' '{ print $(NF-1) }' \
    > my-log-group_my-log-stream.txt

你可以用它制作一个方便的脚本,公认的不如@Guss 的强大但足够简单。我将其保存为 getLogs.sh 并使用 ./getLogs.sh log-group log-stream

调用它
#!/bin/bash

if [[ "${#}" != 2 ]]
then
    echo "This script requires two arguments!"
    echo
    echo "Usage :"
    echo "[=12=] <log-group-name> <log-stream-name>"
    echo
    echo "Example :"
    echo "[=12=] my-log-group my-log-stream"

    exit 1
fi

OUTPUT_FILE="_.log"
aws logs get-log-events \
    --log-group-name ""\
    --log-stream-name "" \
    | grep '"message":' \
    | awk -F '"' '{ print $(NF-1) }' \
    > "${OUTPUT_FILE}"

echo "Logs stored in ${OUTPUT_FILE}"

您现在可以使用新的 Cloudwatch Logs Insights 页面通过 Cloudwatch 管理控制台执行导出。完整文档在这里 https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_ExportQueryResults.html。我已经开始使用 JSON 将我的 Apache 日志提取到 Cloudwatch 中,所以 YMMV 如果你没有提前设置的话。

Add Query to Dashboard or Export Query Results

After you run a query, you can add the query to a CloudWatch dashboard, or copy the results to the clipboard.

Queries added to dashboards automatically re-run every time you load the dashboard and every time that the dashboard refreshes. These queries count toward your limit of four concurrent CloudWatch Logs Insights queries.

To add query results to a dashboard

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

In the navigation pane, choose Insights.

Choose one or more log groups and run a query.

Choose Add to dashboard.

Select the dashboard, or choose Create new to create a new dashboard for the query results.

Choose Add to dashboard.

To copy query results to the clipboard

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

In the navigation pane, choose Insights.

Choose one or more log groups and run a query.

Choose Actions, Copy query results.

我发现 AWS 文档完整且准确。 https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3ExportTasks.html 这规定了将日志从 Cloudwatch 导出到 S3

的步骤

改编@Guyss 对 macOS 的回答。因为我不是真正的 bash 人,所以不得不使用 python 将日期转换为人类可读的形式。

runaswslog -1w 获取上周等等

runawslog() { sh awslogs.sh  | grep "EVENTS" | python parselogline.py; }

awslogs.sh:

#!/bin/bash
#set -x
function dumpstreams() {
  aws $AWSARGS logs describe-log-streams \
    --order-by LastEventTime --log-group-name $LOGGROUP \
    --output text | while read -a st; do 
      [ "${st[4]}" -lt "$starttime" ] && continue
      stname="${st[1]}"
      echo ${stname##*:}
    done | while read stream; do
      aws $AWSARGS logs get-log-events \
        --start-from-head --start-time $starttime \
        --log-group-name $LOGGROUP --log-stream-name $stream --output text
    done
}
AWSARGS=""
#AWSARGS="--profile myprofile --region us-east-1"
LOGGROUP="/aws/lambda/StockTrackFunc"
TAIL=
FROMDAT=
starttime=$(date -v ${FROMDAT} +%s)000
nexttime=$(date +%s)000
dumpstreams
if [ -n "$TAIL" ]; then
  while true; do
    starttime=$nexttime
    nexttime=$(date +%s)000
    sleep 1
    dumpstreams
  done
fi

parselogline.py:

import sys
import datetime
dat=sys.stdin.read()
for k in dat.split('\n'):
    d=k.split('\t')
    if len(d)<3:
        continue
    d[2]='\t'.join(d[2:])
    print( str(datetime.datetime.fromtimestamp(int(d[1])/1000)) + '\t' + d[2] )

受 saputkin 的启发,我创建了一个 pyton 脚本,可以在给定时间段内下载日志组的所有日志。

脚本本身:https://github.com/slavogri/aws-logs-downloader.git

如果那个时期有多个日志流,将创建多个文件。下载的文件将存储在当前目录中,并以给定时间段内有日志事件的日志流命名。 (如果组名中包含正斜杠,将被下划线代替,每个文件如果已经存在,将被覆盖。)

先决条件:您需要登录您的 aws 配置文件。脚本本身将代表您使用 AWS 命令​​行 APIs:“aws logs describe-log-streams”和“aws logs get-log-events”

Usage example: python aws-logs-downloader -g /ecs/my-cluster-test-my-app -t "2021-09-04 05:59:50 +00:00" -i 60

optional arguments:
   -h, --help         show this help message and exit
   -v, --version      show program's version number and exit
   -g , --log-group   (required) Log group name for which the log stream events needs to be downloaded
   -t , --end-time    (default: now) End date and time of the downloaded logs in format: %Y-%m-%d %H:%M:%S %z (example: 2021-09-04 05:59:50 +00:00)
   -i , --interval    (default: 30) Time period in minutes before the end-time. This will be used to calculate the time since which the logs will be downloaded.
   -p , --profile     (default: dev) The aws profile that is logged in, and on behalf of which the logs will be downloaded.
   -r , --region      (default: eu-central-1) The aws region from which the logs will be downloaded.

如果对你有用,请告诉我。 :)

在我这样做之后,我了解到还有另一个使用 Boto3 的选项:https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.get_log_events

命令行 API 在我看来仍然是一个不错的选择。

export LOGGROUPNAME=[SOME_LOG_GROUP_NAME]; for LOGSTREAM in `aws --output text logs describe-log-streams --log-group-name ${LOGGROUPNAME} |awk '{print }'`; do aws --output text logs get-log-events --log-group-name ${LOGGROUPNAME} --log-stream-name ${LOGSTREAM} >> ${LOGGROUPNAME}_output.txt; done

我有一个类似的用例,我必须下载给定日志组的所有流。查看此脚本是否有帮助。

#!/bin/bash

if [[ "${#}" != 1 ]]
then
    echo "This script requires two arguments!"
    echo
    echo "Usage :"
    echo "[=10=] <log-group-name>"

    exit 1
fi

streams=`aws logs describe-log-streams --log-group-name ""`


for stream in $(jq '.logStreams | keys | .[]' <<< "$streams"); do 
    record=$(jq -r ".logStreams[$stream]" <<< "$streams")
    streamName=$(jq -r ".logStreamName" <<< "$record")
    echo "Downloading ${streamName}";
    echo `aws logs get-log-events --log-group-name "" --log-stream-name "$streamName" --output json > "${stream}.log" `
    echo "Completed dowload:: ${streamName}";
done;

您已将日志组名称作为参数传递。

Eg: bash <name_of_the_bash_file>.sh <group_name>