加入具有多个条件的 2 个表

Question

假设如果我在搜索栏中键入 index=endpoints 并按回车键，我将得到如下所示的结果：

{
  "user": Jack,
  "os_name": "Windows",
  "hostname": "Windows-JACK-01",
  "pid": "30219",
  "app": "/usr/bin/curl",
  "cmdline": "curl google.com",
  "epoch": "1503452096",
  "type": "processes"
}

. . .

{
  "hostname": "Windows-JACK-01",
  "pid": "30219",
  "app": "/usr/bin/curl",
  "epoch": "1503452096",
  "ip": "123.123.123.123",
  "port": "1234",
  "type": "sockets"
}

. . .

同一个索引下有两种数据——sockets和processes。我想找到一种方法将这两种类型的数据（相关的）结合起来，以便我可以获得包含所有信息的更丰富的数据。

+-------------------------+-----------------+---------+
| hostname | pid | app | osname | ip | port | etc.... |
+-------------------------+-----------------+---------+
| ...      | ... | ... | ...    | x  | y    | ...     |
+-------------------------+-----------------+---------+

问题是如果我只是做类似的事情：

index=endpoints type="processes"
| join left=L right=L WHERE L.pid=R.pid [ search index=endpoints type="sockets" ]

大多数时候，我会在 app 和 pid 之间得到一个错误的映射，因为任何应用程序都可以在可用时分配给相同的 pid。

我在想，如果我添加更多的条件，它会降低不准确率。例如，不只是 L.pid=R.pid，也许我可以做 L.pid=R.pid AND L.hostname=R.hostname AND ...

我天真的做法是在 WHERE 部分添加更多条件

index=endpoints type="processes"
| join left=L right=L WHERE (L.pid=R.pid AND L.x=R.x AND...)
  [ search index=endpoints type="sockets" ]

然而，它似乎不是这样工作的。有什么建议吗？

Answer 1

从您的示例查询来看，我猜您是一位经验丰富的 SQL 用户，是 Splunk 的新用户，尚未阅读有关 join 命令的手册。 join 不接受 where 子句，也没有 left 或 right 选项。作为最佳实践，应尽可能避免 join，因为它非常低效。

请尝试使用 stats。我们使用 stats 作为其分组功能，而不是计算统计数据。

index=endpoints (type="processes" OR type="sockets")
| stats values(*) as * by hostname, pid

加入具有多个条件的 2 个表

Join 2 tables with multiple conditions

splunk

splunk-query