行操作和排序
Line manipulation & sorting
我可以编写 Linux 脚本,但可以使用一些建议。我知道这个问题有点模糊,所以如果你能提供任何帮助,我将不胜感激!
下面这个问题是为了个人成长,因为我在写一些网络工具给fun/learning。不涉及家庭作业(我是大四学生,我 类 中的 none 需要这些东西!)
我正在使用 tshark 获取有关数据包捕获的信息。这是它的样子:
rachel@Ubuntu-1:~/PCAP$ tshark -r LargeTorrent.pcap -q -z io,phs
===================================================================
Protocol Hierarchy Statistics
Filter:
eth frames:4309 bytes:3984321
ip frames:4119 bytes:3969006
icmp frames:1316 bytes:1308988
udp frames:1408 bytes:1350786
data frames:1368 bytes:1346228
dns frames:16 bytes:1176
nbns frames:14 bytes:1300
http frames:8 bytes:1596
nbdgm frames:2 bytes:486
smb frames:2 bytes:486
mailslot frames:2 bytes:486
browser frames:2 bytes:486
tcp frames:1395 bytes:1309232
data frames:1300 bytes:1294800
http frames:6 bytes:3763
data-text-lines frames:2 bytes:324
xml frames:2 bytes:3205
tcp.segments frames:1 bytes:787
nbss frames:34 bytes:5863
smb frames:17 bytes:3047
pipe frames:4 bytes:686
lanman frames:4 bytes:686
smb2 frames:13 bytes:2444
bittorrent frames:10 bytes:1709
tcp.segments frames:2 bytes:433
bittorrent frames:2 bytes:433
bittorrent frames:1 bytes:258
bittorrent frames:2 bytes:221
bittorrent frames:2 bytes:221
arp frames:146 bytes:8760
ipv6 frames:44 bytes:6555
udp frames:40 bytes:6211
dns frames:18 bytes:1711
dhcpv6 frames:14 bytes:2114
http frames:6 bytes:1014
data frames:2 bytes:1372
icmpv6 frames:4 bytes:344
===================================================================
我希望它看起来像什么:
rachel@Ubuntu-1:~/PCAP$ tshark -r LargeTorrent.pcap -q -z io,phs
===================================================================
Protocol Hierarchy Statistics
Filter:
Protocol Bytes
=====================================
eth 984321
ip 3969006
icmp 1308988
udp 1350786
data 1346228
dns 1176
nbns 1300
http 1596
nbdgm 486
smb 486
mailslot 486
browser 486
tcp 1309232
data 1294800
http 3763
data-text-lines 324
xml 3205
tcp.segments 787
nbss 5863
smb 3047
pipe 686
lanman 686
smb2 2444
bittorrent 1709
tcp.segments 433
bittorrent 433
bittorrent 258
bittorrent 221
bittorrent 221
arp 8760
ipv6 6555
udp 6211
dns 1711
dhcpv6 2114
http 1014
data 1372
icmpv6 344
===================================================================
编辑:我将添加原始问题,以便理解所提供的(很好的)答案。
最初,我只想打印 "leaves" 的统计信息,因为 eth、ip 等都是 parents 并且它们的统计信息对于我的目的来说不是必需的。此外,我不想让 god-awful 文本块只有空格来显示层次结构,我想删除 parents 的所有统计信息,并将它们显示为 child 后面的面包屑。
示例:
eth frames:4309 bytes:3984321
ip frames:4119 bytes:3969006
icmp frames:1316 bytes:1308988
udp frames:1408 bytes:1350786
data frames:1368 bytes:1346228
dns frames:16 bytes:1176
应该变成
eth:ip:icmp - 1308988 bytes
eth:ip:udp:data - 1346228 bytes
eth:ip:udp:dns - 1176 bytes
保留层次结构并避免打印无用的统计信息。
总之,Etan认可的答案完美解决了这个问题!对于那些与我同级别但不确定在回答后如何继续的人,这将帮助您完成:
- 将给定的脚本保存为
filename.awk
文件
- 将要操作的文本块另存为
filename.txt
文件
- 致电
awk -f filename.awk filename.txt
- 可选择将输出通过管道传输到文件 (
awk -f filename.awk filename.txt >> output.txt
)
我原以为您想要的输出可以通过这个 awk
脚本实现。 (我认为这可能会做得更干净,但这似乎工作得很好。)
function entry() {
# Don't want to print empty entries.
if (ind[0]) {
printf "%s", ind[0]
for (i = 1; i <= ls; i++) {
printf ":%s", ind[i]
}
split(b, a, /:/)
printf " - %s %s\n", a[2], a[1]
}
}
# Found our data marker. Note that and print the current line.
== "Filter:" {d=1; print; next}
# Print lines until we see our data marker.
!d {print; next}
# Print empty lines.
!NF {print; next}
# Save our trailing line for later.
/===/ {suf=[=10=]; next}
{
# Save our previous indentation level.
ls = s
# Find our new indentation level (by where the first field starts).
s = (match([=10=], /[^[:space:]]/)-1) / 2
# If the current line is at or below the last indent level print the last line.
if (s <= ls) {
entry()
}
# Save the current line's byte count.
b=$NF
# Save the current line's field name.
ind[s] =
}
END {
# Print a final line if we had one.
entry()
# Print the suffix line if we have one.
if (suf) {
print suf
}
}
在示例输入中,它会为您提供此输出。
===================================================================
Protocol Hierarchy Statistics
Filter:
eth:ip:icmp - 1308988 bytes
eth:ip:udp:data - 1346228 bytes
eth:ip:udp:dns - 1176 bytes
eth:ip:udp:nbns - 1300 bytes
eth:ip:udp:http - 1596 bytes
eth:ip:udp:nbdgm:smb:mailslot:browser - 486 bytes
eth:ip:tcp:data - 1294800 bytes
eth:ip:tcp:http:data-text-lines - 324 bytes
eth:ip:tcp:http:xml:tcp.segments - 787 bytes
eth:ip:tcp:nbss:smb:pipe:lanman - 686 bytes
eth:ip:tcp:nbss:smb2 - 2444 bytes
eth:ip:tcp:bittorrent:tcp.segments:bittorrent:bittorrent - 258 bytes
eth:ip:tcp:bittorrent:bittorrent:bittorrent - 221 bytes
eth:arp - 8760 bytes
eth:ipv6:udp:dns - 1711 bytes
eth:ipv6:udp:dhcpv6 - 2114 bytes
eth:ipv6:udp:http - 1014 bytes
eth:ipv6:udp:data - 1372 bytes
eth:ipv6:icmpv6:data - 344 bytes
===================================================================
不过,使用 sed
可能更容易处理像您编辑的那样表示您想要的输出。
/Filter:/a \
Protocol Bytes \
=====================================
s/frames:[^ ]*//
s/ b/b/
s/bytes:\([^ ]*\)//
以输出结束。
===================================================================
Protocol Hierarchy Statistics
Filter:
Protocol Bytes
=====================================
eth 3984321
ip 3969006
icmp 1308988
udp 1350786
data 1346228
dns 1176
nbns 1300
http 1596
nbdgm 486
smb 486
mailslot 486
browser 486
tcp 1309232
data 1294800
http 3763
data-text-lines 324
xml 3205
tcp.segments 787
nbss 5863
smb 3047
pipe 686
lanman 686
smb2 2444
bittorrent 1709
tcp.segments 433
bittorrent 433
bittorrent 258
bittorrent 221
bittorrent 221
arp 8760
ipv6 6555
udp 6211
dns 1711
dhcpv6 2114
http 1014
data 1372
icmpv6 344
===================================================================
带有 sed
的简单脚本也可以。
$ printf "\n==========================================================\n"; printf "Protocol Hierarchy Statistics\nFilter:\n\n";printf "\nProtocol\t\t\t\t Bytes\n================================================\n" && sed -e 's/\(frames[:].*bytes[:]\)\(.*$\)//' dat/tshark.txt | tail -n+4 | head -n-1 && printf "================================================\n"
分解为脚本形式(其中 dat/tshark.txt
是包含 tshark
输出的文件名):
printf "\n==========================================================\n"
printf "Protocol Hierarchy Statistics\nFilter:\n\n"
printf "\nProtocol\t\t\t\t Bytes\n================================================\n"
sed -e 's/\(frames[:].*bytes[:]\)\(.*$\)//' dat/tshark.txt | tail -n+4 | head -n-1
printf "================================================\n"
输出
==========================================================
Protocol Hierarchy Statistics
Filter:
Protocol Bytes
================================================
eth 3984321
ip 3969006
icmp 1308988
udp 1350786
data 1346228
dns 1176
nbns 1300
http 1596
nbdgm 486
smb 486
mailslot 486
browser 486
tcp 1309232
data 1294800
http 3763
data-text-lines 324
xml 3205
tcp.segments 787
nbss 5863
smb 3047
pipe 686
lanman 686
smb2 2444
bittorrent 1709
tcp.segments 433
bittorrent 433
bittorrent 258
bittorrent 221
bittorrent 221
arp 8760
ipv6 6555
udp 6211
dns 1711
dhcpv6 2114
http 1014
data 1372
icmpv6 344
================================================
正在格式化
根据您关于如何在给定 protocol tags
可变长度的情况下对齐 bytes
信息的评论,您可以使用 printf
来格式化输出表明的。像 Ethan 一样,我开始研究您合并标签的原始问题。我最初的方法是将不同级别读取到不同的 关联数组 中,这些数组可以组合成您最初指定的内容。这样做,我必须生成使用 printf
排列的输出。这是我第一次尝试使用你的 tshark 数据的前 4 个级别:
declare -i ln=0
declare -A l1 l2 l3 l4
## read each line in file and assing to associative arrays for each level
while read -r line; do
ln=${#line} # base level on length of line read
[ $ln -gt 66 ] && continue;
[ $ln -eq 66 ] && { iface="${line%% *}"; l1[${iface}]="${line##* }"; }
[ $ln -eq 64 ] && { proto="${iface}:${line%% *}"; l2[${proto}]="${line##* }"; }
[ $ln -eq 62 ] && { ptype="${proto}:${line%% *}"; l3[${ptype}]="${line##* }"; }
[ $ln -le 60 ] && { data="${ptype}:${line%% *}"; l4[${data}]="${line##* }"; }
done < ""
## output a summary of the file
printf "\n4-level deep summary of file '%s':\n\n" ""
for i in "${!l1[@]}"; do
for j in "${!l2[@]}"; do
printf " %-32s %s\n" "$j" "${l2[$j]}"
for k in "${!l3[@]}"; do
printf " %-32s %s\n" "$k" "${l3[$k]}"
for l in "${!l4[@]}"; do
[ "${l%:*}" == "$k" ] && printf " %-32s %s\n" "$l" "${l4[$l]}"
done
done
done
done
它产生的输出例如:
eth:ip frames:4119 bytes:3969006
eth:ip:udp frames:1408 bytes:1350786
eth:ip:udp:data frames:1368 bytes:1346228
eth:ip:udp:nbdgm frames:2 bytes:486
eth:ip:udp:nbns frames:14 bytes:1300
您可以查看上面代码中的各种 printf
语句,了解如何处理对齐。如果您还有其他问题,请告诉我。
我可以编写 Linux 脚本,但可以使用一些建议。我知道这个问题有点模糊,所以如果你能提供任何帮助,我将不胜感激!
下面这个问题是为了个人成长,因为我在写一些网络工具给fun/learning。不涉及家庭作业(我是大四学生,我 类 中的 none 需要这些东西!)
我正在使用 tshark 获取有关数据包捕获的信息。这是它的样子:
rachel@Ubuntu-1:~/PCAP$ tshark -r LargeTorrent.pcap -q -z io,phs
===================================================================
Protocol Hierarchy Statistics
Filter:
eth frames:4309 bytes:3984321
ip frames:4119 bytes:3969006
icmp frames:1316 bytes:1308988
udp frames:1408 bytes:1350786
data frames:1368 bytes:1346228
dns frames:16 bytes:1176
nbns frames:14 bytes:1300
http frames:8 bytes:1596
nbdgm frames:2 bytes:486
smb frames:2 bytes:486
mailslot frames:2 bytes:486
browser frames:2 bytes:486
tcp frames:1395 bytes:1309232
data frames:1300 bytes:1294800
http frames:6 bytes:3763
data-text-lines frames:2 bytes:324
xml frames:2 bytes:3205
tcp.segments frames:1 bytes:787
nbss frames:34 bytes:5863
smb frames:17 bytes:3047
pipe frames:4 bytes:686
lanman frames:4 bytes:686
smb2 frames:13 bytes:2444
bittorrent frames:10 bytes:1709
tcp.segments frames:2 bytes:433
bittorrent frames:2 bytes:433
bittorrent frames:1 bytes:258
bittorrent frames:2 bytes:221
bittorrent frames:2 bytes:221
arp frames:146 bytes:8760
ipv6 frames:44 bytes:6555
udp frames:40 bytes:6211
dns frames:18 bytes:1711
dhcpv6 frames:14 bytes:2114
http frames:6 bytes:1014
data frames:2 bytes:1372
icmpv6 frames:4 bytes:344
===================================================================
我希望它看起来像什么:
rachel@Ubuntu-1:~/PCAP$ tshark -r LargeTorrent.pcap -q -z io,phs
===================================================================
Protocol Hierarchy Statistics
Filter:
Protocol Bytes
=====================================
eth 984321
ip 3969006
icmp 1308988
udp 1350786
data 1346228
dns 1176
nbns 1300
http 1596
nbdgm 486
smb 486
mailslot 486
browser 486
tcp 1309232
data 1294800
http 3763
data-text-lines 324
xml 3205
tcp.segments 787
nbss 5863
smb 3047
pipe 686
lanman 686
smb2 2444
bittorrent 1709
tcp.segments 433
bittorrent 433
bittorrent 258
bittorrent 221
bittorrent 221
arp 8760
ipv6 6555
udp 6211
dns 1711
dhcpv6 2114
http 1014
data 1372
icmpv6 344
===================================================================
编辑:我将添加原始问题,以便理解所提供的(很好的)答案。
最初,我只想打印 "leaves" 的统计信息,因为 eth、ip 等都是 parents 并且它们的统计信息对于我的目的来说不是必需的。此外,我不想让 god-awful 文本块只有空格来显示层次结构,我想删除 parents 的所有统计信息,并将它们显示为 child 后面的面包屑。
示例:
eth frames:4309 bytes:3984321
ip frames:4119 bytes:3969006
icmp frames:1316 bytes:1308988
udp frames:1408 bytes:1350786
data frames:1368 bytes:1346228
dns frames:16 bytes:1176
应该变成
eth:ip:icmp - 1308988 bytes
eth:ip:udp:data - 1346228 bytes
eth:ip:udp:dns - 1176 bytes
保留层次结构并避免打印无用的统计信息。
总之,Etan认可的答案完美解决了这个问题!对于那些与我同级别但不确定在回答后如何继续的人,这将帮助您完成:
- 将给定的脚本保存为
filename.awk
文件 - 将要操作的文本块另存为
filename.txt
文件 - 致电
awk -f filename.awk filename.txt
- 可选择将输出通过管道传输到文件 (
awk -f filename.awk filename.txt >> output.txt
)
我原以为您想要的输出可以通过这个 awk
脚本实现。 (我认为这可能会做得更干净,但这似乎工作得很好。)
function entry() {
# Don't want to print empty entries.
if (ind[0]) {
printf "%s", ind[0]
for (i = 1; i <= ls; i++) {
printf ":%s", ind[i]
}
split(b, a, /:/)
printf " - %s %s\n", a[2], a[1]
}
}
# Found our data marker. Note that and print the current line.
== "Filter:" {d=1; print; next}
# Print lines until we see our data marker.
!d {print; next}
# Print empty lines.
!NF {print; next}
# Save our trailing line for later.
/===/ {suf=[=10=]; next}
{
# Save our previous indentation level.
ls = s
# Find our new indentation level (by where the first field starts).
s = (match([=10=], /[^[:space:]]/)-1) / 2
# If the current line is at or below the last indent level print the last line.
if (s <= ls) {
entry()
}
# Save the current line's byte count.
b=$NF
# Save the current line's field name.
ind[s] =
}
END {
# Print a final line if we had one.
entry()
# Print the suffix line if we have one.
if (suf) {
print suf
}
}
在示例输入中,它会为您提供此输出。
===================================================================
Protocol Hierarchy Statistics
Filter:
eth:ip:icmp - 1308988 bytes
eth:ip:udp:data - 1346228 bytes
eth:ip:udp:dns - 1176 bytes
eth:ip:udp:nbns - 1300 bytes
eth:ip:udp:http - 1596 bytes
eth:ip:udp:nbdgm:smb:mailslot:browser - 486 bytes
eth:ip:tcp:data - 1294800 bytes
eth:ip:tcp:http:data-text-lines - 324 bytes
eth:ip:tcp:http:xml:tcp.segments - 787 bytes
eth:ip:tcp:nbss:smb:pipe:lanman - 686 bytes
eth:ip:tcp:nbss:smb2 - 2444 bytes
eth:ip:tcp:bittorrent:tcp.segments:bittorrent:bittorrent - 258 bytes
eth:ip:tcp:bittorrent:bittorrent:bittorrent - 221 bytes
eth:arp - 8760 bytes
eth:ipv6:udp:dns - 1711 bytes
eth:ipv6:udp:dhcpv6 - 2114 bytes
eth:ipv6:udp:http - 1014 bytes
eth:ipv6:udp:data - 1372 bytes
eth:ipv6:icmpv6:data - 344 bytes
===================================================================
不过,使用 sed
可能更容易处理像您编辑的那样表示您想要的输出。
/Filter:/a \
Protocol Bytes \
=====================================
s/frames:[^ ]*//
s/ b/b/
s/bytes:\([^ ]*\)//
以输出结束。
===================================================================
Protocol Hierarchy Statistics
Filter:
Protocol Bytes
=====================================
eth 3984321
ip 3969006
icmp 1308988
udp 1350786
data 1346228
dns 1176
nbns 1300
http 1596
nbdgm 486
smb 486
mailslot 486
browser 486
tcp 1309232
data 1294800
http 3763
data-text-lines 324
xml 3205
tcp.segments 787
nbss 5863
smb 3047
pipe 686
lanman 686
smb2 2444
bittorrent 1709
tcp.segments 433
bittorrent 433
bittorrent 258
bittorrent 221
bittorrent 221
arp 8760
ipv6 6555
udp 6211
dns 1711
dhcpv6 2114
http 1014
data 1372
icmpv6 344
===================================================================
带有 sed
的简单脚本也可以。
$ printf "\n==========================================================\n"; printf "Protocol Hierarchy Statistics\nFilter:\n\n";printf "\nProtocol\t\t\t\t Bytes\n================================================\n" && sed -e 's/\(frames[:].*bytes[:]\)\(.*$\)//' dat/tshark.txt | tail -n+4 | head -n-1 && printf "================================================\n"
分解为脚本形式(其中 dat/tshark.txt
是包含 tshark
输出的文件名):
printf "\n==========================================================\n"
printf "Protocol Hierarchy Statistics\nFilter:\n\n"
printf "\nProtocol\t\t\t\t Bytes\n================================================\n"
sed -e 's/\(frames[:].*bytes[:]\)\(.*$\)//' dat/tshark.txt | tail -n+4 | head -n-1
printf "================================================\n"
输出
==========================================================
Protocol Hierarchy Statistics
Filter:
Protocol Bytes
================================================
eth 3984321
ip 3969006
icmp 1308988
udp 1350786
data 1346228
dns 1176
nbns 1300
http 1596
nbdgm 486
smb 486
mailslot 486
browser 486
tcp 1309232
data 1294800
http 3763
data-text-lines 324
xml 3205
tcp.segments 787
nbss 5863
smb 3047
pipe 686
lanman 686
smb2 2444
bittorrent 1709
tcp.segments 433
bittorrent 433
bittorrent 258
bittorrent 221
bittorrent 221
arp 8760
ipv6 6555
udp 6211
dns 1711
dhcpv6 2114
http 1014
data 1372
icmpv6 344
================================================
正在格式化
根据您关于如何在给定 protocol tags
可变长度的情况下对齐 bytes
信息的评论,您可以使用 printf
来格式化输出表明的。像 Ethan 一样,我开始研究您合并标签的原始问题。我最初的方法是将不同级别读取到不同的 关联数组 中,这些数组可以组合成您最初指定的内容。这样做,我必须生成使用 printf
排列的输出。这是我第一次尝试使用你的 tshark 数据的前 4 个级别:
declare -i ln=0
declare -A l1 l2 l3 l4
## read each line in file and assing to associative arrays for each level
while read -r line; do
ln=${#line} # base level on length of line read
[ $ln -gt 66 ] && continue;
[ $ln -eq 66 ] && { iface="${line%% *}"; l1[${iface}]="${line##* }"; }
[ $ln -eq 64 ] && { proto="${iface}:${line%% *}"; l2[${proto}]="${line##* }"; }
[ $ln -eq 62 ] && { ptype="${proto}:${line%% *}"; l3[${ptype}]="${line##* }"; }
[ $ln -le 60 ] && { data="${ptype}:${line%% *}"; l4[${data}]="${line##* }"; }
done < ""
## output a summary of the file
printf "\n4-level deep summary of file '%s':\n\n" ""
for i in "${!l1[@]}"; do
for j in "${!l2[@]}"; do
printf " %-32s %s\n" "$j" "${l2[$j]}"
for k in "${!l3[@]}"; do
printf " %-32s %s\n" "$k" "${l3[$k]}"
for l in "${!l4[@]}"; do
[ "${l%:*}" == "$k" ] && printf " %-32s %s\n" "$l" "${l4[$l]}"
done
done
done
done
它产生的输出例如:
eth:ip frames:4119 bytes:3969006
eth:ip:udp frames:1408 bytes:1350786
eth:ip:udp:data frames:1368 bytes:1346228
eth:ip:udp:nbdgm frames:2 bytes:486
eth:ip:udp:nbns frames:14 bytes:1300
您可以查看上面代码中的各种 printf
语句,了解如何处理对齐。如果您还有其他问题,请告诉我。