Bash:拆分字符串列表,每个字符串包含 space 分隔的单词,每个单词在不同的变量中
Bash: splitting a list of strings each containing space-separated words in different variables for each word
我正在尝试解析 apache 错误日志 以 grep 对应于 fail2ban[=45= 中发现的“违规”IP 的行] 日志。
我在 bash.
中使用脚本
首先,我提取了有问题的 IP:
offenders=$(grep -F "[apache-errors] Found" /var/log/fail2ban.log | awk '{print }' | sort | uniq)
然后对于每个 IP,我从 fail2ban.log 中获取条目;可能有多个条目,因为IP可能有多次请求:
for ip in $offenders; do
entries=$(grep -F "[apache-errors] Found $ip" /var/log/fail2ban.log | awk '{print " "" "}' | sort | uniq)
declare _count_entries=$(echo "${entries[@]}" | wc -l)
echo "Found $_count_entries error entries for IP $ip"
for entry in "${entries[@]}"; do
echo "$entry"
done
done
这是我目前得到的(IP 已匿名):
[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
现在我想要做的是,对于每个行提取ip、日期和时间部分。我尝试过类似的方法,但是 它不起作用 ,它只打印第一个 entry
:
的 (ip,date,time)
for ip in $offenders; do
entries=$(grep -F "[apache-errors] Found $ip" /var/log/fail2ban.log | awk '{print " "" "}' | sort | uniq)
for entry in "${entries[@]}"; do
echo "$entry"
_ip=($(echo "$entry" | cut -d ' ' -f1))
_date=($(echo "$entry" | cut -d ' ' -f2))
_time=($(echo "$entry" | cut -d ' ' -f3))
echo "ip=$_ip , date=$_date , time=$_time"
done
done
输出:对于每个条目,只有第一个条目的 (ip,date,time) 部分被回显:
[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
ip=10.10.0.29 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
所需的输出为:
[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
ip=10.10.0.29 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
ip=10.20.0.242 , date=2021-12-30 , time=12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
ip=10.30.0.186 , date=2022-01-02 , time=05:40:24
ip=10.30.0.186 , date=2022-01-02 , time=07:38:55
那么我如何在 bash 中做到这一点?
最终目标是使用 ip、日期和时间部分来构建这样的正则表达式,因为我想从错误日志中 grep 与 fail2ban 日志中的发现完全对应的行:
grep -P "^(\[$_date $_time)(.+\[client )($_ip).+$" /var/log/apache2/error.log
你可以这样选择:
#!/bin/bash
print_errors() {
local ip=
[ -n "$ip" ] || return
shift
echo "[INFO] Found ${#@} error entries for IP $ip"
printf '%s\n' "$@"
}
prev_ip=
errors=()
while read -r ip date time
do
if [ "$prev_ip" != "$ip" ]
then
print_errors "$prev_ip" "${errors[@]}"
prev_ip=$ip
errors=()
fi
errors+=("ip=$ip , date=$date , time=$time")
done < <(
grep -F "[apache-errors] Found" /var/log/fail2ban.log |
awk '{print " "" "}' |
sort
)
print_errors "$prev_ip" "${errors[@]}"
但是 bash 并不是真正的意思,最好用 awk
编写相同的逻辑(我在这里在 awk 之外进行排序):
grep -F "[apache-errors] Found" /var/log/fail2ban.log | sort -k 8,1 |
awk '
function print_errors(ip, arr) {
if (ip == "") return
print "[INFO] Found "length(arr)" error entries for IP "ip
for (i in arr) print arr[i]
}
BEGIN { ip = "" }
{
if ( != ip) {
print_errors(ip, arr)
delete arr
ip =
}
arr[length(arr)+1] = "ip="" , date="" , time="
}
END{ print_errors(ip, arr) }
'
或者更好的是,用具有多维关联数组和文本处理功能的语言编写整个内容:
示例ruby
:
#!/usr/bin/env ruby
ARGF.each_line.with_object(Hash.new{|h,k| h[k] = []}) do |line,hash|
ip,date,time = line.split.values_at(7,9,10)
hash[ip] << "ip=#{ip} , date=#{date} , time=#{time}"
end.each do |ip,arr|
puts "[INFO] Found #{arr.count} error entries for IP #{ip}"
puts arr.join("\n")
end
以上三个程序的输出示例:
[INFO] Found 1 error entries for IP 10.10.0.129
ip=10.10.0.129 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
ip=10.20.0.242 , date=2021-12-30 , time=12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
ip=10.30.0.186 , date=2022-01-02 , time=05:40:24
ip=10.30.0.186 , date=2022-01-02 , time=07:38:55
我正在尝试解析 apache 错误日志 以 grep 对应于 fail2ban[=45= 中发现的“违规”IP 的行] 日志。
我在 bash.
中使用脚本首先,我提取了有问题的 IP:
offenders=$(grep -F "[apache-errors] Found" /var/log/fail2ban.log | awk '{print }' | sort | uniq)
然后对于每个 IP,我从 fail2ban.log 中获取条目;可能有多个条目,因为IP可能有多次请求:
for ip in $offenders; do
entries=$(grep -F "[apache-errors] Found $ip" /var/log/fail2ban.log | awk '{print " "" "}' | sort | uniq)
declare _count_entries=$(echo "${entries[@]}" | wc -l)
echo "Found $_count_entries error entries for IP $ip"
for entry in "${entries[@]}"; do
echo "$entry"
done
done
这是我目前得到的(IP 已匿名):
[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
现在我想要做的是,对于每个行提取ip、日期和时间部分。我尝试过类似的方法,但是 它不起作用 ,它只打印第一个 entry
:
for ip in $offenders; do
entries=$(grep -F "[apache-errors] Found $ip" /var/log/fail2ban.log | awk '{print " "" "}' | sort | uniq)
for entry in "${entries[@]}"; do
echo "$entry"
_ip=($(echo "$entry" | cut -d ' ' -f1))
_date=($(echo "$entry" | cut -d ' ' -f2))
_time=($(echo "$entry" | cut -d ' ' -f3))
echo "ip=$_ip , date=$_date , time=$_time"
done
done
输出:对于每个条目,只有第一个条目的 (ip,date,time) 部分被回显:
[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
ip=10.10.0.29 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
所需的输出为:
[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
ip=10.10.0.29 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
ip=10.20.0.242 , date=2021-12-30 , time=12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
ip=10.30.0.186 , date=2022-01-02 , time=05:40:24
ip=10.30.0.186 , date=2022-01-02 , time=07:38:55
那么我如何在 bash 中做到这一点?
最终目标是使用 ip、日期和时间部分来构建这样的正则表达式,因为我想从错误日志中 grep 与 fail2ban 日志中的发现完全对应的行:
grep -P "^(\[$_date $_time)(.+\[client )($_ip).+$" /var/log/apache2/error.log
你可以这样选择:
#!/bin/bash
print_errors() {
local ip=
[ -n "$ip" ] || return
shift
echo "[INFO] Found ${#@} error entries for IP $ip"
printf '%s\n' "$@"
}
prev_ip=
errors=()
while read -r ip date time
do
if [ "$prev_ip" != "$ip" ]
then
print_errors "$prev_ip" "${errors[@]}"
prev_ip=$ip
errors=()
fi
errors+=("ip=$ip , date=$date , time=$time")
done < <(
grep -F "[apache-errors] Found" /var/log/fail2ban.log |
awk '{print " "" "}' |
sort
)
print_errors "$prev_ip" "${errors[@]}"
但是 bash 并不是真正的意思,最好用 awk
编写相同的逻辑(我在这里在 awk 之外进行排序):
grep -F "[apache-errors] Found" /var/log/fail2ban.log | sort -k 8,1 |
awk '
function print_errors(ip, arr) {
if (ip == "") return
print "[INFO] Found "length(arr)" error entries for IP "ip
for (i in arr) print arr[i]
}
BEGIN { ip = "" }
{
if ( != ip) {
print_errors(ip, arr)
delete arr
ip =
}
arr[length(arr)+1] = "ip="" , date="" , time="
}
END{ print_errors(ip, arr) }
'
或者更好的是,用具有多维关联数组和文本处理功能的语言编写整个内容:
示例ruby
:
#!/usr/bin/env ruby
ARGF.each_line.with_object(Hash.new{|h,k| h[k] = []}) do |line,hash|
ip,date,time = line.split.values_at(7,9,10)
hash[ip] << "ip=#{ip} , date=#{date} , time=#{time}"
end.each do |ip,arr|
puts "[INFO] Found #{arr.count} error entries for IP #{ip}"
puts arr.join("\n")
end
以上三个程序的输出示例:
[INFO] Found 1 error entries for IP 10.10.0.129
ip=10.10.0.129 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
ip=10.20.0.242 , date=2021-12-30 , time=12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
ip=10.30.0.186 , date=2022-01-02 , time=05:40:24
ip=10.30.0.186 , date=2022-01-02 , time=07:38:55