如何区分电子邮件Imap中的内联图像和签名以及其他空白图像
how to distinguish between inline image and signature and other blank images in email Imap
我正在使用 Mailkit 从邮箱中获取电子邮件并将其保存到数据库以显示在我的 MVC 应用程序中。
我将 html 电子邮件作为纯文本保存在数据库中,我可以获取附件并将其保存在文件系统中,但是当电子邮件中有内联图像时,我遇到了签名和其他空白图像的问题也被保存为文件系统中的附件。
有没有办法区分内联附件和签名或其他空白图像?
提前致谢
使用https://mailsystem.codeplex.com/:
阅读电子邮件的 class:
class readMail:IDisposable
{
public Imap4Client client = new Imap4Client();
public readMail(string mailServer, int port, bool ssl, string login, string password)
{
Pop3Client pop = new Pop3Client();
if (ssl)
{
client.ConnectSsl(mailServer, port);
}
else
client.Connect(mailServer, port);
client.Login(login, password);
}
public IEnumerable<Message> GetAllMails(string mailBox)
{
IEnumerable<Message> ms = GetMails(mailBox, "ALL").Cast<Message>();
return GetMails(mailBox, "ALL").Cast<Message>();
}
protected Imap4Client Client
{
get { return client ?? (client = new Imap4Client()); }
}
private MessageCollection GetMails(string mailBox, string searchPhrase)
{
try
{
MessageCollection messages = new MessageCollection();
Mailbox mails = new Mailbox();
mails = Client.SelectMailbox(mailBox);
messages = mails.SearchParse(searchPhrase);
return messages;
}
catch(Exception ecc)
{
}
}
public void Dispose()
{
throw new NotImplementedException();
}
}
然后:
using (readMail read = new readMail("host.name.information", port, true, username, password) )
{
var emailList = read.GetAllMails(this.folderEmail);
int k = 0;
Mailbox bbb = read.client.SelectMailbox(this.folderEmail);
int[] unseen = bbb.Search("UNSEEN");
foreach (Message email in emailList)
{
/// Contains all parts for which no Content-Disposition header was found. Disposition is left to the final agent.
MimePartCollection im1= email.UnknownDispositionMimeParts;
//Collection containing embedded MIME parts of the message (included text parts)
EmbeddedObjectCollection im2 = email.EmbeddedObjects;
//Collection containing attachments of the message.
AttachmentCollection attach=email.Attachments;
}
}
在我的案例中,所有签名的图像都在 UnknownDispositionMimeParts 中,但这可能是一个特定的案例(不同的电子邮件客户端等)..所以据我所知,我没有找到任何将嵌入图像与上下文图像到签名图像
无论您使用哪个 IMAP 库,none 库都有一个功能可以帮助您做您想做的事,因为这是一个需要解决的重要问题需要用点巧思才能解决。
您可以做的是从 FAQ 中的 HtmlPreviewVisitor
示例开始,然后稍微修改它以将附件分成 2 个列表:
- 实际附件列表
- 实际上 被 HTML 引用的图像列表(通过遍历 HTML 并跟踪引用了哪些图像)
代码:
/// <summary>
/// Visits a MimeMessage and splits attachments into those that are
/// referenced by the HTML body vs regular attachments.
/// </summary>
class AttachmentVisitor : MimeVisitor
{
List<MultipartRelated> stack = new List<MultipartRelated> ();
List<MimeEntity> attachments = new List<MimeEntity> ();
List<MimePart> embedded = new List<MimePart> ();
bool foundBody;
/// <summary>
/// Creates a new AttachmentVisitor.
/// </summary>
public AttachmentVisitor ()
{
}
/// <summary>
/// The list of attachments that were in the MimeMessage.
/// </summary>
public IList<MimeEntity> Attachments {
get { return attachments; }
}
/// <summary>
/// The list of embedded images that were in the MimeMessage.
/// </summary>
public IList<MimePart> EmbeddedImages {
get { return embedded; }
}
protected override void VisitMultipartAlternative (MultipartAlternative alternative)
{
// walk the multipart/alternative children backwards from greatest level of faithfulness to the least faithful
for (int i = alternative.Count - 1; i >= 0 && !foundBody; i--)
alternative[i].Accept (this);
}
protected override void VisitMultipartRelated (MultipartRelated related)
{
var root = related.Root;
// push this multipart/related onto our stack
stack.Add (related);
// visit the root document
root.Accept (this);
// pop this multipart/related off our stack
stack.RemoveAt (stack.Count - 1);
}
// look up the image based on the img src url within our multipart/related stack
bool TryGetImage (string url, out MimePart image)
{
UriKind kind;
int index;
Uri uri;
if (Uri.IsWellFormedUriString (url, UriKind.Absolute))
kind = UriKind.Absolute;
else if (Uri.IsWellFormedUriString (url, UriKind.Relative))
kind = UriKind.Relative;
else
kind = UriKind.RelativeOrAbsolute;
try {
uri = new Uri (url, kind);
} catch {
image = null;
return false;
}
for (int i = stack.Count - 1; i >= 0; i--) {
if ((index = stack[i].IndexOf (uri)) == -1)
continue;
image = stack[i][index] as MimePart;
return image != null;
}
image = null;
return false;
}
// called when an HTML tag is encountered
void HtmlTagCallback (HtmlTagContext ctx, HtmlWriter htmlWriter)
{
if (ctx.TagId == HtmlTagId.Image && !ctx.IsEndTag && stack.Count > 0) {
// search for the src= attribute
foreach (var attribute in ctx.Attributes) {
if (attribute.Id == HtmlAttributeId.Src) {
MimePart image;
if (!TryGetImage (attribute.Value, out image))
continue;
if (!embedded.Contains (image))
embedded.Add (image);
}
}
}
}
protected override void VisitTextPart (TextPart entity)
{
TextConverter converter;
if (foundBody) {
// since we've already found the body, treat this as an
// attachment
attachments.Add (entity);
return;
}
if (entity.IsHtml) {
converter = new HtmlToHtml {
HtmlTagCallback = HtmlTagCallback
};
converter.Convert (entity.Text);
}
foundBody = true;
}
protected override void VisitTnefPart (TnefPart entity)
{
// extract any attachments in the MS-TNEF part
attachments.AddRange (entity.ExtractAttachments ());
}
protected override void VisitMessagePart (MessagePart entity)
{
// treat message/rfc822 parts as attachments
attachments.Add (entity);
}
protected override void VisitMimePart (MimePart entity)
{
// realistically, if we've gotten this far, then we can treat
// this as an attachment even if the IsAttachment property is
// false.
attachments.Add (entity);
}
}
使用方法:
var visitor = new AttachmentVisitor ();
message.Accept (visitor);
// Now you can use visitor.Attachments and visitor.EmbeddedImages
一个更简单的方法,虽然不太容易出错(因为它实际上并不验证图像是否被 HTML 引用),方法是这样的:
var embeddedImages = message.BodyParts.OfType<MimePart> ().
Where (x => x.ContentType.IsMimeType ("image", "*") &&
x.ContentDisposition != null &&
x.ContentDisposition.Disposition.Equals ("inline" StringComparison.OrdinalIgnoreCase));
现在您有了 embeddedImages
的列表,您必须想办法确定它们是仅用于签名还是用于 HTML 的其他地方。
您很可能还必须分析 HTML 本身。
可能还值得注意的是,某些 HTML 邮件会引用网络上的图像,这些图像 未 嵌入到邮件的 MIME 中。如果您还想要 这些 图片,您需要修改 TryGetImage
以在我提供的代码无法在邮件的 MIME。
对于 text/plain 消息(根本不能使用图像),将签名与消息正文的其余部分分开的常见约定是一行只有 2 个破折号和一个 space: --
.
根据我对 HTML 具有签名的消息的有限经验,它们似乎没有遵循类似的约定。查看我从使用 Outlook 的 Microsoft 同事收到的一些 HTML 消息,它们似乎位于消息末尾的 <table>
内。但是,这假设消息不是回复。一旦您开始解析消息回复,此 <table>
最终会出现在消息的中间某处,因为被回复的原始消息在末尾。
由于每个人的签名也不同,我不确定这种 <table>
相似性是否是 Outlook 惯例,或者人们是否手动构建他们的签名并且他们只是出于巧合而使用表格(我也只看过几个,大部分都不用签名,所以我的样本量很小)。
我正在使用 Mailkit 从邮箱中获取电子邮件并将其保存到数据库以显示在我的 MVC 应用程序中。
我将 html 电子邮件作为纯文本保存在数据库中,我可以获取附件并将其保存在文件系统中,但是当电子邮件中有内联图像时,我遇到了签名和其他空白图像的问题也被保存为文件系统中的附件。
有没有办法区分内联附件和签名或其他空白图像?
提前致谢
使用https://mailsystem.codeplex.com/:
阅读电子邮件的 class:
class readMail:IDisposable
{
public Imap4Client client = new Imap4Client();
public readMail(string mailServer, int port, bool ssl, string login, string password)
{
Pop3Client pop = new Pop3Client();
if (ssl)
{
client.ConnectSsl(mailServer, port);
}
else
client.Connect(mailServer, port);
client.Login(login, password);
}
public IEnumerable<Message> GetAllMails(string mailBox)
{
IEnumerable<Message> ms = GetMails(mailBox, "ALL").Cast<Message>();
return GetMails(mailBox, "ALL").Cast<Message>();
}
protected Imap4Client Client
{
get { return client ?? (client = new Imap4Client()); }
}
private MessageCollection GetMails(string mailBox, string searchPhrase)
{
try
{
MessageCollection messages = new MessageCollection();
Mailbox mails = new Mailbox();
mails = Client.SelectMailbox(mailBox);
messages = mails.SearchParse(searchPhrase);
return messages;
}
catch(Exception ecc)
{
}
}
public void Dispose()
{
throw new NotImplementedException();
}
}
然后:
using (readMail read = new readMail("host.name.information", port, true, username, password) )
{
var emailList = read.GetAllMails(this.folderEmail);
int k = 0;
Mailbox bbb = read.client.SelectMailbox(this.folderEmail);
int[] unseen = bbb.Search("UNSEEN");
foreach (Message email in emailList)
{
/// Contains all parts for which no Content-Disposition header was found. Disposition is left to the final agent.
MimePartCollection im1= email.UnknownDispositionMimeParts;
//Collection containing embedded MIME parts of the message (included text parts)
EmbeddedObjectCollection im2 = email.EmbeddedObjects;
//Collection containing attachments of the message.
AttachmentCollection attach=email.Attachments;
}
}
在我的案例中,所有签名的图像都在 UnknownDispositionMimeParts 中,但这可能是一个特定的案例(不同的电子邮件客户端等)..所以据我所知,我没有找到任何将嵌入图像与上下文图像到签名图像
无论您使用哪个 IMAP 库,none 库都有一个功能可以帮助您做您想做的事,因为这是一个需要解决的重要问题需要用点巧思才能解决。
您可以做的是从 FAQ 中的 HtmlPreviewVisitor
示例开始,然后稍微修改它以将附件分成 2 个列表:
- 实际附件列表
- 实际上 被 HTML 引用的图像列表(通过遍历 HTML 并跟踪引用了哪些图像)
代码:
/// <summary>
/// Visits a MimeMessage and splits attachments into those that are
/// referenced by the HTML body vs regular attachments.
/// </summary>
class AttachmentVisitor : MimeVisitor
{
List<MultipartRelated> stack = new List<MultipartRelated> ();
List<MimeEntity> attachments = new List<MimeEntity> ();
List<MimePart> embedded = new List<MimePart> ();
bool foundBody;
/// <summary>
/// Creates a new AttachmentVisitor.
/// </summary>
public AttachmentVisitor ()
{
}
/// <summary>
/// The list of attachments that were in the MimeMessage.
/// </summary>
public IList<MimeEntity> Attachments {
get { return attachments; }
}
/// <summary>
/// The list of embedded images that were in the MimeMessage.
/// </summary>
public IList<MimePart> EmbeddedImages {
get { return embedded; }
}
protected override void VisitMultipartAlternative (MultipartAlternative alternative)
{
// walk the multipart/alternative children backwards from greatest level of faithfulness to the least faithful
for (int i = alternative.Count - 1; i >= 0 && !foundBody; i--)
alternative[i].Accept (this);
}
protected override void VisitMultipartRelated (MultipartRelated related)
{
var root = related.Root;
// push this multipart/related onto our stack
stack.Add (related);
// visit the root document
root.Accept (this);
// pop this multipart/related off our stack
stack.RemoveAt (stack.Count - 1);
}
// look up the image based on the img src url within our multipart/related stack
bool TryGetImage (string url, out MimePart image)
{
UriKind kind;
int index;
Uri uri;
if (Uri.IsWellFormedUriString (url, UriKind.Absolute))
kind = UriKind.Absolute;
else if (Uri.IsWellFormedUriString (url, UriKind.Relative))
kind = UriKind.Relative;
else
kind = UriKind.RelativeOrAbsolute;
try {
uri = new Uri (url, kind);
} catch {
image = null;
return false;
}
for (int i = stack.Count - 1; i >= 0; i--) {
if ((index = stack[i].IndexOf (uri)) == -1)
continue;
image = stack[i][index] as MimePart;
return image != null;
}
image = null;
return false;
}
// called when an HTML tag is encountered
void HtmlTagCallback (HtmlTagContext ctx, HtmlWriter htmlWriter)
{
if (ctx.TagId == HtmlTagId.Image && !ctx.IsEndTag && stack.Count > 0) {
// search for the src= attribute
foreach (var attribute in ctx.Attributes) {
if (attribute.Id == HtmlAttributeId.Src) {
MimePart image;
if (!TryGetImage (attribute.Value, out image))
continue;
if (!embedded.Contains (image))
embedded.Add (image);
}
}
}
}
protected override void VisitTextPart (TextPart entity)
{
TextConverter converter;
if (foundBody) {
// since we've already found the body, treat this as an
// attachment
attachments.Add (entity);
return;
}
if (entity.IsHtml) {
converter = new HtmlToHtml {
HtmlTagCallback = HtmlTagCallback
};
converter.Convert (entity.Text);
}
foundBody = true;
}
protected override void VisitTnefPart (TnefPart entity)
{
// extract any attachments in the MS-TNEF part
attachments.AddRange (entity.ExtractAttachments ());
}
protected override void VisitMessagePart (MessagePart entity)
{
// treat message/rfc822 parts as attachments
attachments.Add (entity);
}
protected override void VisitMimePart (MimePart entity)
{
// realistically, if we've gotten this far, then we can treat
// this as an attachment even if the IsAttachment property is
// false.
attachments.Add (entity);
}
}
使用方法:
var visitor = new AttachmentVisitor ();
message.Accept (visitor);
// Now you can use visitor.Attachments and visitor.EmbeddedImages
一个更简单的方法,虽然不太容易出错(因为它实际上并不验证图像是否被 HTML 引用),方法是这样的:
var embeddedImages = message.BodyParts.OfType<MimePart> ().
Where (x => x.ContentType.IsMimeType ("image", "*") &&
x.ContentDisposition != null &&
x.ContentDisposition.Disposition.Equals ("inline" StringComparison.OrdinalIgnoreCase));
现在您有了 embeddedImages
的列表,您必须想办法确定它们是仅用于签名还是用于 HTML 的其他地方。
您很可能还必须分析 HTML 本身。
可能还值得注意的是,某些 HTML 邮件会引用网络上的图像,这些图像 未 嵌入到邮件的 MIME 中。如果您还想要 这些 图片,您需要修改 TryGetImage
以在我提供的代码无法在邮件的 MIME。
对于 text/plain 消息(根本不能使用图像),将签名与消息正文的其余部分分开的常见约定是一行只有 2 个破折号和一个 space: --
.
根据我对 HTML 具有签名的消息的有限经验,它们似乎没有遵循类似的约定。查看我从使用 Outlook 的 Microsoft 同事收到的一些 HTML 消息,它们似乎位于消息末尾的 <table>
内。但是,这假设消息不是回复。一旦您开始解析消息回复,此 <table>
最终会出现在消息的中间某处,因为被回复的原始消息在末尾。
由于每个人的签名也不同,我不确定这种 <table>
相似性是否是 Outlook 惯例,或者人们是否手动构建他们的签名并且他们只是出于巧合而使用表格(我也只看过几个,大部分都不用签名,所以我的样本量很小)。