如何使用 Python 对每个流的 Wireshark TCP 数据包进行分组

Question

我在 Wireshark 中捕获了 tcp 数据并将数据导出到 csv，现在我正在尝试使用 python 对每个流的 tcp 数据包进行分组，但我不确定该怎么做。

如果源、源端口、目标、目标端口在行中向前和向后相同，则它被视为同一流的一部分，即 A->B 和 B->A

在下面的例子中有两个流程：

Source          Src Port    Destination     Dest Port
10.129.200.119  49298       17.248.144.77   443 
10.129.200.119  49299       17.253.37.210   80

No. Time    Source  Src Port    Destination Dest Port   Protocol    Length  Flags
37  12.045906   10.129.200.119  49298   17.248.144.77   443 TCP 54  0x010
38  12.04922    17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
39  13.634783   10.129.200.119  49298   17.248.144.77   443 TLSv1.2 112 0x018
40  13.635868   10.129.200.119  49298   17.248.144.77   443 TLSv1.2 97  0x018
41  13.636239   10.129.200.119  49298   17.248.144.77   443 TCP 66  0x011
42  13.640724   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
43  13.640731   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x011
44  13.640732   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
45  13.640852   10.129.200.119  49298   17.248.144.77   443 TCP 66  0x011
47  14.472724   10.129.200.119  49299   17.253.37.210   80  TCP 78  0x0c2
48  14.478233   17.253.37.210   80  10.129.200.119  49299   TCP 74  0x052
50  14.478405   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
51  14.479316   10.129.200.119  49299   17.253.37.210   80  HTTP    361 0x018
52  14.483419   17.253.37.210   80  10.129.200.119  49299   TCP 66  0x010
53  14.483425   17.253.37.210   80  10.129.200.119  49299   TCP 1514    0x010
54  14.483427   17.253.37.210   80  10.129.200.119  49299   TCP 1514    0x010
55  14.48343    17.253.37.210   80  10.129.200.119  49299   OCSP    319 0x018
56  14.48355    10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
57  14.483551   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
58  14.486264   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x011
59  14.490827   17.253.37.210   80  10.129.200.119  49299   TCP 66  0x011
60  14.490914   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010

Answer 1

我建议将数据从 wireshark 导出为 .json 格式，有一种更好的方法可以使用未导出为 csv 格式的信息对 tcp 会话进行分组。为了从您的 pcap 中创建一个 json 文件，请执行以下操作：File->Export Packet Dissection->AS JSON...

这样做之后，您可以查看字段tcp.stream，它与tcp 流("flow") 具有相同的值。

然后您可以使用此代码遍历数据包，并搜索特定的 tcp.stream 值：

import json

with open('path_to_your_json.json') as json_file:
    packets = json.load(json_file)

    count = 0
    for packet in packets:
        layers = packet["_source"]['layers']
        if "tcp" in layers:
            if layers["tcp"]["tcp.stream"]=="11":
                count=count+1
    print(count)

以这段代码为例，跟踪所有在 11 号流中的 tcp 数据包，并对它们进行计数。

为了高效地工作并理解你在做什么，我建议你在文本编辑器（如 sublime）中打开 json 文件，看看它包含什么以及事物的层次结构。此外，我建议阅读 python 中的 json：w3schools python and json

Answer 2

您可以使用 pandas 来做到这一点。如果您将列 Src Port 重命名为 Src_Port 并将 Dest Port 重命名为 Dest_Port。

假设['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']这对是'flow'（我绝不是领域专家），而你的数据在'wireshark_dump.csv'，你可以进行如下操作

import pandas as pd


df = pd.read_csv('wireshark_dump.csv', delim_whitespace=True)

flow_columns = ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']
for flow, flow_data in df.groupby(flow_columns):
    print(flow)
    print(flow_data)

请注意，根据您的进一步处理方式，您可能不想迭代 groupby 组，因为它很慢。

Answer 3

也许你可以试试 pandas。下面的片段。根据源 ip 地址对数据行进行分组。

我不明白你说的流量是什么意思。我假设这意味着根据源和目标 ip 对。

import pandas as pd

with open('data.txt') as f:
    lines = f.readlines()
    data = []
    for line in lines:
        tokens = line.split()
        data.append(tokens)
    df = pd.DataFrame(data, columns=list("ABCDEFGHI"))
    print(df)
    grouped_df = df.groupby('C', as_index=False)
    for key, item in grouped_df:
        print(grouped_df.get_group(key), "\n\n")

给出这样的输出

[8 rows x 9 columns]
    A          B               C      D  ...    F        G    H      I
0  37  12.045906  10.129.200.119  49298  ...  443      TCP   54  0x010
2  39  13.634783  10.129.200.119  49298  ...  443  TLSv1.2  112  0x018
3  40  13.635868  10.129.200.119  49298  ...  443  TLSv1.2   97  0x018
4  41  13.636239  10.129.200.119  49298  ...  443      TCP   66  0x011

[4 rows x 9 columns] 


    A          B              C    D               E      F    G   H      I
1  38   12.04922  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010
5  42  13.640724  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010
6  43  13.640731  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x011
7  44  13.640732  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010

如何使用 Python 对每个流的 Wireshark TCP 数据包进行分组

How to group Wireshark TCP packets per flow using Python

python

tcp

tcpdump

packet

wireshark