如何从 Logs Explorer 中删除重复的 GCP 日志？

Question

我正在使用 GCP 日志资源管理器来存储来自我的管道的日志消息。我需要通过查看特定事件的日志来调试问题。除了末尾的事件 ID 外，此错误的消息完全相同。

所以例如错误信息是

event ID does not exist: foo

我知道我可以使用以下语法构建一个查询，该查询将return具有此特定消息结构的日志

resource.type="some_resource"
resource.labels.project_id="some_project"
resource.labels.job_id="some_id"
severity=WARNING
jsonPayload.message:"Event ID does not exist:"

该查询的最后一行将 return 消息包含该字符串的每个日志。

我最终得到这样的结果

Event ID does not exist: 1A
Event ID does not exist: 2A
Event ID does not exist: 2A
Event ID does not exist: 3A

所以我希望对它进行重复数据删除，最终只得到

Event ID does not exist: 1A
Event ID does not exist: 2A
Event ID does not exist: 3A

但是我在 language docs

中看不到对这种类型的重复数据删除的支持

由于行数过多，我也无法下载带分隔符的日志文件。是否可以删除重复的行数？

Answer 1

要使用 BigQuery 删除重复记录，请按照以下步骤操作：

确定您的数据集是否包含重复项。
创建一个 SELECT 查询，使用 GROUP BY 条款。
使用 CREATE OR REPLACE TABLE [tablename] AS [SELECT STATEMENT].

您可以查看完整教程 in this link。

要分析大量日志，您可以将它们路由到 BigQuery 并使用 Fluentd 分析日志。

Fluentd 有一个输出插件，可以使用 BigQuery 作为存储收集的日志的目的地。使用该插件，您可以直接从许多服务器近乎实时地将日志加载到 BigQuery。

在 this link 中，您可以找到有关如何使用 Fluentd 和 BigQuery 分析日志的完整教程。

要将您的日志路由到 BigQuery，首先需要创建一个接收器并将其路由到 BigQuery。

Sinks control how Cloud Logging routes logs. Using sinks, you can route some or all of your logs to supported destinations.

Sinks belong to a given Google Cloud resource: Cloud projects, billing accounts, folders, and organizations. When the resource receives a log entry, it routes the log entry according to the sinks contained by that resource. The log entry is sent to the destination associated with each matching sink.

You can route log entries from Cloud Logging to BigQuery using sinks. When you create a sink, you define a BigQuery dataset as the destination. Logging sends log entries that match the sink's rules to partitioned tables that are created for you in that BigQuery dataset.

1) In the Cloud console, go to the Logs Router page:

2) Select an existing Cloud project.

3) Select Create sink.

4) In the Sink details panel, enter the following details:

Sink name: Provide an identifier for the sink; note that after you create the sink, you can't rename the sink but you can delete it and create a new sink.

Sink description (optional): Describe the purpose or use case for the sink.

5) In the Sink destination panel, select the sink service and destination:

Select sink service: Select the service where you want your logs routed. Based on the service that you select, you can select from the following destinations:

BigQuery table: Select or create the particular dataset to receive the routed logs. You also have the option to use partitioned tables.

For example, if your sink destination is a BigQuery dataset, the sink destination would be the following:
bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID
Note that if you are routing logs between Cloud projects, you still need the appropriate destination permissions.

6) In the Choose logs to include in sink panel, do the following:

In the Build inclusion filter field, enter a filter expression that matches the log entries you want to include. If you don't set a filter, all logs from your selected resource are routed to the destination.

To verify you entered the correct filter, select Preview logs. This opens the Logs Explorer in a new tab with the filter prepopulated.

7) (Optional) In the Choose logs to filter out of sink panel, do the following:

In the Exclusion filter name field, enter a name.

In the Build an exclusion filter field, enter a filter expression that matches the log entries you want to exclude. You can also use the sample function to select a portion of the log entries to exclude. You can create up to 50 exclusion filters per sink. Note that the length of a filter can't exceed 20,000 characters.

8) Select Create sink.

有关配置和管理接收器的详细信息here。

要查看将日志条目从 Cloud Logging 路由到 BigQuery 时适用的详细信息、格式和规则，请follow this link。

如何从 Logs Explorer 中删除重复的 GCP 日志？

How to deduplicate GCP logs from Logs Explorer?

google-cloud-platform

google-cloud-logging