如何使用正则表达式提取字符串中的域？

Question

我得到一个代表用户数据的字符串。

提取此字符串中域的正确正则表达式是什么？

我知道我必须找到所有包含 2 个字符的字符串，这些字符匹配最后一个“.”之后的条件。在“@”之后。

然而我还是没有实现。

Answer 1

import re

regex = r"@.+\.([a-z]{2}),"

your_string = ("001,Francisca,Dr Jhonaci,jhonadr@abc.com,32yearsold,120.238.225.0\n"
    "002,Lavenda,Bocina,lavenboci@banck.ac.uk,50yearsold,121.186.221.182\n"
    "003,Laura,Eglington,elinton@python.co.jp,26yearsold,36.55.173.63\n"
    "004,Timo,Baum,timobaum@tennis.co.cn,22yearsold,121.121.110.10")

matches = re.finditer(regex, your_string, re.MULTILINE)

for match in matches:
    result = match.group(1)
    print(result)

Answer 2

在电子邮件后面使用逗号而不是最后一个点。

使用这个正则表达式

@.+\.(\w+)(?<!com),

捕获组将包含您想要的信息。

Answer 3

逗号似乎是字符串中的分隔符。

为了不交叉匹配逗号（以防止匹配太多），也不交叉匹配第二个@char，您可以使用以[^[=开头的否定字符class 16=]

如果条目也可以位于字符串的末尾，您可以断言 , 或字符串的末尾。

@[^@,]*\.([A-Za-z]{2})(?=,|$)

Regex demo

import re

regex = r"@[^@,]*\.([A-Za-z]{2})(?=,|$)"

s = ("001,Francisca,Dr Jhonaci,jhonadr@abc.com,32yearsold,120.238.225.0\n"
    "002,Lavenda,Bocina,lavenboci@banck.ac.uk,50yearsold,121.186.221.182\n"
    "003,Laura,Eglington,elinton@python.co.jp,26yearsold,36.55.173.63\n"
    "004,Timo,Baum,timobaum@tennis.co.cn,22yearsold,121.121.110.10")

print(re.findall(regex, s, re.M))

输出

['uk', 'jp', 'cn']

如何使用正则表达式提取字符串中的域？

How to extract the domain in a string by using regex?

subdomain