在 TCPdump 中解构 BPF 过滤器

Question

正在尝试解构这个 TCPdump BPF 风格的过滤器，需要一些帮助：

'tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420'

取自here

为更好地了解正在发生的事情而采取的步骤：

1. Lets convert the 0x47455420 to ascii 
    ===> GET 
    ===> tcp[((tcp[12:1] & 0xf0) >> 2):4] = GET
2. Examine the inner tcp filter: (tcp[12:1] & 0xf0) 
    ===> the 0xf0 == 0000 0000 1111 0000 ===> I suppose it is save to discard the upper zeros so I can write 1111 0000
    ===> tcp[12:1] == 08 (start filtering from byte 13 (0 based indexing, so you could also say start with the byte that has index 12) for 1 byte, so only 13th byte);
    ===> 08 == 0000 1000
    ===> 0000 1000 & 1111 0000 == 0000 (bitwise and = if both are 1 then end result is one)

这就是我感到困惑的地方。我在上面提供的超链接中的解释是

multiply it by four ( (tcp[12:1] & 0xf0)>>2 ) which should give the tcp header length

如果为零则不可能。请：

帮我找出计算中的错误（也许我混合了 TCP 和 IP headers？）；
请指导我的逻辑是否正确。

这是数据包：

19:10:30.091065 IP (tos 0x0, ttl 63, id 40127, offset 0, flags [DF], proto TCP (6), length 2786)
10.240.35.81.47856 > 172.17.13.201.8080: Flags [P.], cksum 0xf2ef (incorrect -> 0xb8f8), seq 2263020471:2263023205, ack 4187927811, win 28, options [nop,nop,TS val 1906863883 ecr 214445688], length 2734
0x0000:  1a17 8e8a a3a0 026d 627d 049c 0800 4500  .......mb}....E.
         0,1  2,3  ...  ...  ...  ...  12,13 ...                    <=== byte indexes
         1,2  3,4  ...  ...  ...  ...  13,14 ...                    <=== counting how many bytes
0x0010:  0ae2 9cbf 4000 3f06 ac3b 0af0 2351 ac11  ....@.?..;..#Q..  <=== 0x0010 number correctly identifies that the first two diggits are the 16th byte
         16,17 ... ...
0x0020:  0dc9 baf0 1f90 86e2 f3b7 f99e b503 8018  ................
0x0030:  001c f2ef 0000 0101 080a 71a8 6f0b 0cc8  ..........q.o...
0x0040:  2e78 4745 5420 2f69 636f 6e73 2f75 6e6b  .xGET./icons/unk
0x0050:  6e6f 776e 2e67 6966 2048 5454 502f 312e  nown.gif.HTTP/1.
0x0060:  310d 0a68 6f73 743a 2070 6870 2d6d 696e  1..host:.php-min

Answer 1

tcp[12:1]是从TCP开头偏移12个字节的字节header； 12 是不是从数据包开始的偏移量，它是从 TCP header 开始的偏移量（它是 tcp[12:1]，而不是 ether[12:1] 或类似的东西）。 “1”是指的字节数。

根据RFC 793, which is the specification for TCP, the byte at an offset of 12 bytes from the beginning of the TCP header，高4位包含数据偏移量，低4位为保留位。数据偏移量为“TCP中32位字的个数Header”，即“表示数据从哪里开始”。

数据包中的数据显示为字节对序列。如果以单个字节序列的形式呈现会更容易理解，因此：

0x0000:  1a 17 8e 8a a3 a0 02 6d 62 7d 04 9c 08 00 45 00
         eth dest          eth src           etype IP hdr

数据包的前 6 个字节是以太网目标地址。

数据包接下来的 6 个字节是以太网源地址。

后面的2个字节是以太网类型值；它是 big-endian，所以它的值为 0x0800，这是 IPv4 的以太网类型值。

接下来的 2 个字节是 IPv4 header 的前 2 个字节。根据 RFC 791, which is the specification for IPv4, the first byte of the IPv4 header 包含高 4 位的 IP 版本和低 4 位的 header 长度。该字节的值为 0x45，因此 IP 版本为 4（对于 IPv4 而言应该如此）并且 header 长度为 5。header 长度“是互联网的长度 header 在 32 位字中”，所以这是 5 个 32 位字，或 20 个字节。

所以，现在，让我们跳过 IPv4 header，转到 TCP header:

0x0020:  0d c9 ba f0 1f 90 86 e2 f3 b7 f9 9e b5 03 80 18
               TCP header                          12 13

所以 TCP header 的字节 12 是 0x80。 0x80 & 0xf0就是0x80，0x80 >> 2就是0x20，也就是32；这与该字节的高 4 位是数据偏移量一致，在 32 位字中，如 8*4 = 32.

因此，对于这个数据包，

tcp[((tcp[12:1] & 0xf0) >> 2):4] 是 tcp[32:4]，即从 TCP header.

开始偏移 32 处的 4 个字节

从 TCP header 开始的 32 个字节是：

0x0040:  2e78 4745 5420 2f69 636f 6e73 2f75 6e6b
              ^

那是 HTTP 请求的“GET”header，从 TCP 段数据的开头开始。有问题的 4 个字节是“GET”。

因此 tcp[12:1] 中的 12 不是数据包开头的偏移量，它是 TCP header（是 tcp[12:1]，不是 ether[12:1] 或类似的东西）。

并且，在回答有关数据包字节及其内容的问题时：

0x0000:  1a 17 8e 8a a3 a0: Ethernet destination
         02 6d 62 7d 04 9c: Ethernet source
         08 00: Ethernet type/length field - 0x0800 = IPv4

因此数据包的前 14 (0x000e) 个字节是以太网 header。

在此数据包中，以太网 type/length 字段为 0x0800，因此以太网有效负载，紧随以太网 header，是一个 IPv4 数据包，以 IPv4 header 开头：

         45: IPv4 version/header length
         00: IPv4 Type of Service/Differentiated Service
0x0010:  0a e2: IPv4 total length
         9c bf: IPv4 identification
         40 00: IPv4 flags/fragment offset
         3f: IPv4 time-to-live
         06: IPv4 (next) protocol - 6 = TCP
         ac 3b: IPv4 header checksum
         0a f0 23 51: IPv4 source address
         ac 11: first 2 bytes of IPv4 destination address
0x0020:  0d c9: second 2 bytes of IPv4 destination address

IPv4header长度为5，所以IPv4header为20字节。这是最小 IPv4 header 长度；它不能更小，但可以更大，如果 header 的 fixed-length 部分之后有 IPv4 选项。在这种情况下，没有。

由于协议字段的值为 6，因此 IPv4 负载是一个 TCP 数据包：

         ba f0: TCP source port (47856)
         1f 90: TCP destination port (8080)
         86 e2 f3 b7: TCP sequence number
         f9 9e b5 03: TCP acknowledgment number
         80: TCP data offset + reserved bits
         18: reserved bits + TCP flags
0x0030:  00 1c: TCP window
         f2 ef: TCP checksum
         00 00: TCP urgent pointer

这是 TCP header 的 20 字节 fixed-length 部分；但是，TCP header 长度为 32 字节，因此 header:

中还有额外的 12 字节 TCP 选项

         01: TCP No-Operation option
         01: TCP No-Operation option
         08 0a 71 a8 6f 0b 0c c8: first 8 bytes of TCP Timestamp option
0x0040:  2e 78: last 2 bytes of TCP Timestamp option

一个TCPheader的长度必须是32位的倍数，即4字节的倍数； TCP 选项的长度可能不是 4 的倍数 - TCP 时间戳选项的长度为 10 个字节 - 因此 No-Operation 选项用于填充。

所以这 32 个字节就是 TCP header；接下来是 TCP 负载。显然，这是在 HTTP 连接上（数据包被发送到端口 8080，这是一个备用 HTTP 端口），这是 HTTP GET 请求的开始：

         47 45 54 20 2f 69 63 6f 6e 73 2f 75 6e 6b
0x0050:  6e 6f 77 6e 2e 67 69 66 20 48 54 54 50 2f 31 2e
0x0060:  31 0d 0a 68 6f 73 74 3a 20 70 68 70 2d 6d 69 6e

所以：

因为这是在未处于监控模式时在以太网或 Wi-Fi 网络上捕获的（或在使用以太网 headers 或适配器所在的其他类型的网络上捕获的或 driver 提供“假以太网”headers，与 Wi-Fi 一样），数据包将以以太网 header;
由于以太网类型值为 0x0800，其后跟一个 IPv4 header；
由于 IPv4 协议值为 6，其后跟一个 TCP header；
由于其中一个 TCP 端口号是 HTTP (8080) 通常使用的端口号，因此可能后跟某种 HTTP 数据（这不能保证，但是 - TCP 端口号更像是提示）。

对于同一网络上的 ARP，您将再次拥有以太网 header（ffff ffff 是以太网广播地址，因此正在广播数据包，因为 ARP 请求通常是这样），以太网类型为 0x0806，这是 ARP 的以太网类型值。

对于同一网络上的 ICMP，您将再次拥有一个以太网 header，并且您还将拥有一个 IPv4 header，因此以太网类型将为 0x0800。 IPv4 header 协议字段中的值将为 1，foICMP.

在 TCPdump 中解构 BPF 过滤器

Deconstructing BPF filter in TCPdump

networking

tcp

filter

tcpdump

bpf