只匹配单词（有时用点分隔）正则表达式

Question

我有一个这样的列表：

example.com=120.0.0.0
ben.example.com=120.0.0.0
+ben.example=120.0.0.0
+ben.example.com.np=120.0.0.0
ben=120.0.0.0
ben-example.com=120.0.0.0
ben43.example.com=120.0.0.0

我只需要找到单词（用点分隔）。没有ips，=, +等等。一些 FQDN 有多个点，一些 none 等等。

这可能吗？

如果脚本在我运行正则表达式时运行良好，我只想得到这些：

ben.example.com.np
ben.example
ben.example.com
example.com
ben
ben43.example.com

我想通过 python 正则表达式将文件解析为 ips 和 FQDNS，这样我就可以使用它并检查 ips 是否可用于域。

Answer 1

这很简单

import re
fqdns = re.findall(r"[a-zA-Z\.-]{2,}", text, flags=re.M)

给予

['example.com', 'ben.example.com', 'ben.example', 'ben-example.com.np', 'ben']

regex101 example here

该组匹配 a-z 和 A-Z 范围内的所有字符，以及点 . 和 -。 {2,} 表示至少匹配连续 2 个字符，因此它不会匹配 IP 中的点。

编辑：在我写完这个答案后，问题的参数略有变化，因为一些 URL 包含数字。因此，不是使用 re.findall() 来获取（可能是多行）输入中的所有匹配项，您应该使用 re.match().group() 和稍微改变的正则表达式并逐行处理输入：

import re

with open("path/to/file", "r") as f:
    fqdns = [re.match(r"(?:[a-zA-Z\.\-0-9]{2,})", line).group() for line in f]

re.match()，在没有任何标志的情况下，returns 在行中的第一个匹配项之后。 .group() 是您访问匹配字符串的方式。

只匹配单词（有时用点分隔）正则表达式

Match only words (sometimes with dots seperating) regex

python

regex

ip

dns