从 HTML 中的类似字符串中提取邮件地址 Python

Extract mail address from HTML-like string in Python

使用 GMail API 可以恢复电子邮件的发件人:

email = service.users().messages().get(userId='me', id=msg['id']).execute()
payload = email['payload']
headers = payload['headers']
for hdr in headers:
if hdr['name'] == 'From':
   sender = hdr['value']

此 returns 变量 sender 作为字符串,格式为:

"Name Surname" <name.surname@gmail.com>

我想从这个字符串中单独恢复姓名和电子邮件地址。 所以我最终得到 2 (str) 个变量:

sender_name = "Name Surname"
sender_email_address = "name.surname@gmail.com"

是否有图书馆可以协助解决这个问题?

这里有一个选项:

mys = '"Name Surname" <name.surname@gmail.com>'
name_email = mys.split("<")
sender_name = name_email[0].strip().strip('"')
sender_email_address = name_email[1].strip(">")
print(f"sender_name: {sender_name}")
print(f"sender_email_address: {sender_email_address}")

returns

Name Surname
name.surname@gmail.com

您可以使用字符串操作或正则表达式;

import re
s='"Name Surname" <name.surname@gmail.com>'
re.search('^.*<', s).group()[:-1].strip()[1:-1]
'Name Surname'
re.search('<.*>',s).group()
'<name.surname@gmail.com>'

email.utils module in the standard library has a parseaddr function 正是这样做的:

Parse address – which should be the value of some address-containing field such as To or Cc – into its constituent realname and email address parts. Returns a tuple of that information, unless the parse fails, in which case a 2-tuple of ('', '') is returned.

一个例子:

from email.utils import parseaddr

sender = '"Name Surname" <name.surname@gmail.com>'
sender_name, sender_email_address = parseaddr(sender)

sender_name
#=> 'Name Surname'

sender_email_address
#=> 'name.surname@gmail.com'