使用命令行的 CSV 输出文件用于 wireshark IO 图统计

Question

我使用 wireshark GUI 将 IO 图统计信息保存为包含每秒位数的 CSV 文件。有没有办法用命令行 tshark 生成这个 CSV 文件？我可以在命令行上以每秒字节数的形式生成统计信息，如下所示

tshark -nr test.pcap -q -z io,stat,1,BYTES

如何生成 bits/second 并将其保存到 CSV 文件？

感谢任何帮助。

Answer 1

我不知道只使用 tshark 的方法，但您可以轻松地将 tshark 的输出解析为 CSV 文件：

tshark -nr tmp.pcap -q -z io,stat,1,BYTES | grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" | awk -F '[ |]+' '{print ","(*8)}'

说明

grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" 仅从 tshark 输出中选择原始数据和实际数据（即 second <> second | transmitted bytes）。
awk -F '[ |]+' '{print ","(*8)}' 将该数据分成 5 个块，使用 [ |]+ 作为分隔符并显示块 2（开始间隔的第二个）和 5（传输的字节），它们之间有一个逗号.

Answer 2

另一件可能需要知道的事情：

如果将间隔从 1 秒更改为 0.5 秒，则必须通过在两个数字 \d 之间添加 \. 来在 grep 部分中允许 . .

否则结果将是一个空的 *.csv 文件。

grep -P "\d{1,2}\.{1}\d{1,2}\s+<>\s+\d{1,2}\.{1}\d{1,2}\s*\|\s+\d+"

Answer 3

此线程中的答案为我提供了使用 tshark io stats 解决类似问题的关键，我想分享结果及其工作原理。在我的例子中，任务是转换多列 tshark io stat 记录，数据中可能有小数点。此答案将多个数据列转换为 csv，添加基本的 headers，占字段中的小数和 spaces 的可变数字。

完整的命令字符串

tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10" \
| grep -P "\d+\.?\d*\s+<>\s+|Interval +\|" \
| tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g' > somefile.csv

说明

命令字符串有3个主要部分。

tshark 使用列中的数据创建报告
使用 grep
使用tr和sed将grep匹配的记录转换为csv分隔文件。

第 1 部分：tshark 使用列中的数据创建报告

tshark 运行 -z io,stat 间隔 30 秒，使用各种过滤器计算帧和字节数。

tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10"

这是运行针对我的测试 pcap 文件时的输出：

=================================================================================================
| IO Statistics                                                                                 |
|                                                                                               |
| Duration: 179.179180 secs                                                                     |
| Interval:  30 secs                                                                            |
|                                                                                               |
| Col 1: Frames and bytes                                                                       |
|     2: FRAMES                                                                                 |
|     3: BYTES                                                                                  |
|     4: FRAMES()ip.src == 10.10.10.10                                                          |
|     5: BYTES()ip.src == 10.10.10.10                                                           |
|     6: FRAMES()ip.dst == 10.10.10.10                                                          |
|     7: BYTES()ip.dst == 10.10.10.10                                                           |
|-----------------------------------------------------------------------------------------------|
|            |1                   |2       |3          |4       |5         |6       |7          |
| Interval   | Frames |   Bytes   | FRAMES |   BYTES   | FRAMES |   BYTES  | FRAMES |   BYTES   |
|-----------------------------------------------------------------------------------------------|
|   0 <>  30 | 107813 | 120111352 | 107813 | 120111352 |  26682 | 15294257 |  80994 | 104808983 |
|  30 <>  60 | 122437 | 124508575 | 122437 | 124508575 |  49331 | 17080888 |  73017 | 107422509 |
|  60 <>  90 | 138999 | 135488315 | 138999 | 135488315 |  54829 | 22130920 |  84029 | 113348686 |
|  90 <> 120 | 158241 | 217781653 | 158241 | 217781653 |  42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 |  43709 | 18800647 |  67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 |  50754 | 22053280 |  72786 | 120574520 |
=================================================================================================

注意事项

查看此输出，我们可以看到几个需要考虑的项目：

有数据的行在“space<>space”的Interval列中有一个唯一的序列，我们可以用它来进行匹配。
我们想要 header 行，因此我们将使用单词“Interval”，后跟 spaces，然后是“|”字符.
一列中 space 的数量是可变的，具体取决于每次测量的位数。
间隔列给出了从 0 开始的时间和从第一次测量开始的时间。两者都可以使用，所以我们将保留两者并让用户决定。
使用毫秒时，间隔字段中会有小数
根据请求的统计数据，数据列中可能有小数
“|”的使用因为分隔符需要在覆盖它们的任何正则表达式语句中转义。

第 2 部分：使用 grep

提取所需的行

tshark 生成输出后，我们使用带有正则表达式的 grep 来提取我们要保存的行。

grep -P "\d+\.?\d*\s+<>\s+|Interval +\|""

grep 将使用 Interval 列中的“Digit(s)Space(s)<>Space(s)”字符序列来匹配行与数据。它还使用 OR 通过匹配字符“Interval |”来获取 header。

grep -P         # The "-P" flag turns on PCRE regex matching, which is not the same as egrep. With egrep, you will need to change the escaping.
 "\d+            # Match on 1 or more Digits. This is the 1st set of numbers in the Interval column.
 \.?             # 0 or 1 Periods. We need this to handle possible fractional seconds.
 \d*             # 0 or more Digits. To handle possible fractional seconds.
 \s+<>\s+        # 1 or more Spaces followed by the Characters "<>", then 1 or more Spaces.
 |               # Since this is not escaped, it is a regex OR
 Interval\s+\|"  # Match the String "Interval" followed by 1 or more Spaces and a literal "|".

从 tshark 输出，grep 匹配这些行：

| Interval   | Frames |   Bytes   | FRAMES |   BYTES   | FRAMES |   BYTES  | FRAMES |   BYTES   |
|   0 <>  30 | 107813 | 120111352 | 107813 | 120111352 |  26682 | 15294257 |  80994 | 104808983 |
|  30 <>  60 | 122437 | 124508575 | 122437 | 124508575 |  49331 | 17080888 |  73017 | 107422509 |
|  60 <>  90 | 138999 | 135488315 | 138999 | 135488315 |  54829 | 22130920 |  84029 | 113348686 |
|  90 <> 120 | 158241 | 217781653 | 158241 | 217781653 |  42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 |  43709 | 18800647 |  67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 |  50754 | 22053280 |  72786 | 120574520 |

第三部分：使用tr和sed将grep匹配的记录转换为csv分隔文件。

tr 和 sed 用于将 grep 匹配的行转换为 csv。 tr 完成删除 space 和更改“|”的大量工作到 ”，”。这比使用 sed 更简单、更快速。但是，sed 用于一些清理工作

tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g'

以下是这些命令如何执行转换。第一个技巧是去掉所有的 space。这意味着我们不必在任何正则表达式序列中考虑它们，从而使其余工作更简单

| tr -d " "                 # Spaces are in the way, so delete them.
| tr "|" ","                # Change all "|" Characters to ",".
| sed -E 's/<>/,/;          # Change "<>" to "," splitting the Interval column.
  s/(^,|,$)//g;               # Delete leading and/or trailing "," on each line.
  s/Interval/Start,Stop/g'    # Each of the "Interval" columns needs a header, so change the text "Interval" into two words with a , separating them.
> somefile.csv              # Pipe the output into somefile.csv

最终结果

完成此过程后，我们就有了一个 csv 输出，现在可以将其导入您最喜欢的 csv 工具、电子表格，或提供给 gnuplot 等绘图程序。

$cat somefile.csv
Start,Stop,Frames,Bytes,FRAMES,BYTES,FRAMES,BYTES,FRAMES,BYTES
0,30,107813,120111352,107813,120111352,26682,15294257,80994,104808983
30,60,122437,124508575,122437,124508575,49331,17080888,73017,107422509
60,90,138999,135488315,138999,135488315,54829,22130920,84029,113348686
90,120,158241,217781653,158241,217781653,42103,15870237,115971,201901201
120,150,111708,131890800,111708,131890800,43709,18800647,67871,113082296
150,Dur,123736,142639416,123736,142639416,50754,22053280,72786,120574520

使用命令行的 CSV 输出文件用于 wireshark IO 图统计

CSV output file using command line for wireshark IO graph statistics

csv

pcap

wireshark