Python 排除以字符串结尾的正则表达式

Question

我有一个文件，其中包含像

这样的行

From david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008
Received: (from apache@localhost)
Return-Path: <postmaster@collab.sakaiproject.org>
for <source@collab.sakaiproject.org>;

我试图阅读每一行并使用正则表达式来查找域名，基本上是符号@ 之后的部分。这是我写的代码

if re.search('[@]\S+?', line) : org = re.findall('@(\S+)',line)[0]

但是returns结果如下

uct.ac.za
localhost)
collab.sakaiproject.org>
collab.sakaiproject.org>;

有什么聪明的方法可以只保留域而不包含“)”、“>”或“>;”后跟域名？

Answer 1

试试这个

使用正则表达式否定来做到这一点，[^\>\)\s]+

if re.search('@([^\>\)\s]+)', line) : org = re.findall('@([^\>\)\s]+)',line)[0]

输出

uct.ac.za
localhost
collab.sakaiproject.org
collab.sakaiproject.org

Answer 2

稍作更正 - FQDN 也可以包含数字...

所以正则表达式需要稍微调整一下

[@][a-zA-Z0-9.-]+

https://en.wikipedia.org/wiki/Uniform_Resource_Locator

的完整域规则

Python 排除以字符串结尾的正则表达式

Python regular expression to exclude the end with string

python

regex

string