从 gmail API 获取 e-mail 的正确编码

Getting right encoding for e-mail from gmail API

我正在努力让电子邮件中的特殊字符正确显示。

我使用 Gmail API 收到这样的邮件:

msg_id = '169a8fac44fd8115'
service = build('gmail', 'v1', credentials=creds)
message = service.users().messages().get(userId='me', id=msg_id).execute()
htmlpart = message['payload']['parts'][0]['parts'][1]['body']['data']

然后我尝试了以下方法:

file_data = quopri.decodestring(base64.urlsafe_b64decode(htmlpart)).decode('iso-8859-1')
file_data = base64.urlsafe_b64decode(htmlpart.encode('UTF-8')).decode('iso-8859-1')
file_data = base64.urlsafe_b64decode(htmlpart.encode('iso-8859-1')).decode('utf-8')
file_data = base64.urlsafe_b64decode(htmlpart.encode('UTF-8')).decode('utf-8')

None 他们给我正确的输出。相反,我得到的是 €2 而不是 .

供参考,本帖headers如下:

'headers': [{'name': 'Content-Type', 'value': 'text/html; charset="UTF-8"'},
  {'name': 'Content-Transfer-Encoding', 'value': 'quoted-printable'}]

编辑:在下面添加了示例数据。我正在尝试获取 e-mail 的 html,我正在下面复制其中突出显示编码问题的一部分 (You'll get)。

</tr><tr><td class="m_4364729876101169671Uber18_text_p1" align="left" style="color:rgb(0,0,0);font-family:&#39;Uber18-text-Regular&#39;,&#39;HelveticaNeue-Light&#39;,&#39;Helvetica Neue Light&#39;,Helvetica,Arial,sans-serif;font-size:16px;line-height:28px;direction:ltr;text-align:left"> Give friends free ride credit to try Uber. You&#39;ll get CN¥10 off each of your next 3 rides when they start riding. <span class="m_4364729876101169671Uber18_text_p1" style="color:#000000;font-family:&#39;Uber18-text-Regular&#39;,&#39;HelveticaNeue-Light&#39;,&#39;Helvetica Neue Light&#39;,Helvetica,Arial,sans-serif;font-size:16px;line-height:28px">Share code: 20ccv</span></td>

headers

'headers': [{'name': 'Content-Type', 'value': 'text/html; charset="UTF-8"'},
  {'name': 'Content-Transfer-Encoding', 'value': 'quoted-printable'}]

告诉您该消息包含编码为 UTF-8 的文本,然后 quoted-printable 编码以便仅支持 7 位字符的系统可以处理它。

要解码,您需要先从 quoted-printable 解码,然后从 UTF-8 解码结果字节。

像这样的东西应该可以工作:

utf8 = quopri.decodestring(htmlpart)
text = ut8.decode('utf-8')

HTML 电子邮件正文可能包含 character entities. These can be converted to individual characters using html.unescape(在 Python 3.4+ 中可用)。

>>> import html 
>>> h = """</tr><tr><td class="m_4364729876101169671Uber18_text_p1" align="left" style="color:rgb(0,0,0);font-family:&#39;Uber18-text-Regular&#39;,&#39;HelveticaNeue-Light&#39;,&#39;Helvetica Neue Light&#39;,Helvetica,Arial,sans-serif;font-size:16px;line-height:28px;direction:ltr;text-align:left"> Give friends free ride credit to try Uber. You&#39;ll get CN¥10 off each of your next 3 rides when they start riding. <span class="m_4364729876101169671Uber18_text_p1" style="color:#000000;font-family:&#39;Uber18-text-Regular&#39;,&#39;HelveticaNeue-Light&#39;,&#39;Helvetica Neue Light&#39;,Helvetica,Arial,sans-serif;font-size:16px;line-height:28px">Share code: 20ccv</span></td>"""


>>> print(html.unescape(h))
</tr><tr><td class="m_4364729876101169671Uber18_text_p1" align="left" style="color:rgb(0,0,0);font-family:'Uber18-text-Regular','HelveticaNeue-Light','Helvetica Neue Light',Helvetica,Arial,sans-serif;font-size:16px;line-height:28px;direction:ltr;text-align:left"> Give friends free ride credit to try Uber. You'll get CN¥10 off each of your next 3 rides when they start riding. <span class="m_4364729876101169671Uber18_text_p1" style="color:#000000;font-family:'Uber18-text-Regular','HelveticaNeue-Light','Helvetica Neue Light',Helvetica,Arial,sans-serif;font-size:16px;line-height:28px">Share code: 20ccv</span></td>