使用 Shell 脚本在日志文件中提取具有自己时间戳的不可预测数据

Question

log.txt如下，是在这个log.txt文件中不断更新的带有时间戳（detection_time）的ID数据。 ID 数据将是不可预测的数字。它可能是从 0000-9999 并且相同的 ID 可能会再次出现在 log.txt 中。

我的目标是使用 shell 脚本过滤在首次出现后 15 秒内再次出现在 log.txt 中的 ID。谁能帮我解决这个问题？

ID = 4231
detection_time = 1595556730 
ID = 3661
detection_time = 1595556731
ID = 2654
detection_time = 1595556732
ID = 3661
detection_time = 1595556733

更清楚的是，从上面的log.txt来看，ID 3661首先出现在时间1595556731，然后在1595556733再次出现，也就是第一次出现后的2秒。所以它符合我的条件，即想要在 15 秒内再次出现的 ID。我希望这个 ID 3661 被我的 shell 脚本

过滤

运行 shell 脚本后的输出将是 ID = 3661

我的问题是我不知道如何在 shell 脚本中开发编程算法。

这是我尝试使用 ID_new 和 ID_previous 变量但 ID_previous=$(ID_new) detection_previous=$(detection_new) 不起作用的方法

input="/tmp/log.txt"
ID_previous=""
detection_previous=""
while IFS= read -r line
do
    ID_new=$(echo "$line" | grep "ID =" | awk -F " " '{print }')
    echo $ID_new
    detection_new=$(echo "$line" | grep "detection_time =" | awk -F " " '{print }')
    echo $detection_new
    ID_previous=$(ID_new)
    detection_previous=$(detection_new)
done < "$input"

编辑 log.txt 实际上数据在一个集合中，包含 ID、detection_time、年龄和身高。很抱歉没有一开始就提到这个

ID = 4231
detection_time = 1595556730 
Age = 25
Height = 182
ID = 3661
detection_time = 1595556731
Age = 24
Height = 182
ID = 2654
detection_time = 1595556732
Age = 22
Height = 184    
ID = 3661
detection_time = 1595556733
Age = 27
Height = 175
ID = 3852
detection_time = 1595556734
Age = 26
Height = 156
ID = 4231
detection_time = 1595556735 
Age = 24
Height = 184

我试过 Awk 解决方案。结果是 4231 3661 2654 3852 4231 log.txt中的所有ID 正确的输出应该是42313661

据此，我认为年龄和身高数据可能会影响 Awk 解决方案，因为它插入在 ID 和 detection_time 的重点数据之间。

Answer 1

假设日志文件中的时间戳单调递增，您只需要使用 Awk 一次。对于每个 id，跟踪报告的最新时间（使用关联数组 t，其中键是 id，值是最新的时间戳）。如果再次看到相同的id，并且时间戳之间的差异小于15，请报告。

为了更好地衡量，请保留第二个 p 我们已经报告的数组，这样我们就不会报告它们两次。

awk '/^ID = / { id=; next }
    # Skip if this line is neither ID nor detection_time
    !/^detection_time = / { next }
    (id in t) && (t[id] >= -15) && !(p[id]) { print id; ++p[id]; next }
    { t[id] =  }' /tmp/log.txt

如果您真的坚持在 Bash 中原生执行此操作，我会重构您的尝试

declare -A dtime printed
while read -r field _ value
do
    case $field in
     ID) id=$value;;
     detection_time)
      if [[ dtime["$id"] -ge $((value - 15)) ]]; then
          [[ -v printed["$id"] ]] || echo "$id"
          printed["$id"]=1
      fi
      dtime["$id"]=$value ;;
    esac
done < /tmp/log.txt

请注意 read -r 如何像 Awk 一样轻松地用空格分割一行，只要您知道可以预期有多少个字段。但是 while read -r 通常比 Awk 慢一个数量级，您必须承认 Awk 尝试更加简洁和优雅，并且可以移植到旧系统。

（关联数组是在Bash 4中引入的。）

切向地，任何看起来像 grep 'x' | awk '{ y }' 的东西都可以重构为 awk '/x/ { y }'；另见 useless use of grep.

此外，请注意 $(foo) 尝试将运行 foo 作为命令。要简单地引用变量 foo 的值，语法是 $foo（或者，可选地，${foo}，但大括号在这里不添加任何值）。通常你会想要double-quote扩展"$foo"；另见 When to wrap quotes around a shell variable

您的脚本只会记住一个早期事件；关联数组允许我们记住我们之前看到的所有 ID 值（直到我们运行内存不足）。

也没有什么能阻止我们在 Awk 中使用 human-readable 变量名；随意用 printed 代替 p 和 dtime 代替 t 与 Bash 替代品完全相同。

使用 Shell 脚本在日志文件中提取具有自己时间戳的不可预测数据

Extract the unpredictable data that have its own timestamp in a log file using a Shell script

linux

bash

shell

logfile-analysis

logfile