带格式的字节序列的正则表达式

Question

使用 bytes 正则表达式可以正常工作，如下所示：

In [48]: regexp_1 = re.compile(b"\xab.{3}")
In [49]: regexp_1.fullmatch(b"\xab\x66\x77\x88")
Out[49]: <re.Match object; span=(0, 4), match=b'\xabfw\x88'> # <----- good !

当我尝试根据 this post 格式化字节序列时，我失败了：

In [50]: byte = b"\xab"
In [51]: regexp_2 = re.compile(f"{byte}.{3}".encode())
In [52]: regexp_2.fullmatch(b"\xab\x66\x77\x88")
In [53]: # nothing found ... why ?

Answer 1

发生这种情况是因为 f-string 将给定对象转换为字符串，并且当字节对象转换为字符串时，它看起来不像您期望的那样：

>>> str(byte)
"b'\xab'"

所以当你像以前一样通过 f-string 时，它会变得丑陋，再次编码时它会保持这种状态！

>>> f"{byte}.{3}"
"b'\xab'.3"
>>> f"{byte}.{3}".encode()
b"b'\xab'.3"

更不用说 {3} 被解析为 3. 以防止您可以使用双括号 ({{3}})，但这不是问题的重点。

我建议您改为连接字符串。

regexp = re.compile(byte + b'.{3}')

# <re.Match object; span=(0, 4), match=b'\xabfw\x88'>
regexp.fullmatch(b"\xab\x66\x77\x88")

带格式的字节序列的正则表达式

regex of bytes sequence with formatting

python

regex

byte