使用 Java 处理原始电子邮件数据

Processing a Raw Email Data using Java

我有一个存储原始电子邮件内容的数据库。我的要求是从数据库中获取单个邮件并处理该数据以获取该特定电子邮件的基本详细信息(例如 FROM、TO、SUBJECT 等),并使用 Core 将所有附件保存到文件系统Java。目前我能够从数据库中以字符串形式获取原始电子邮件数据,但无法处理该数据。

如何使用 Java 处理此原始电子邮件数据(字符串数据类型)?

编辑: 在数据库级别,数据存储为 NCLOB。从数据库中获取数据后,将其存储为 Java 字符串数据类型。

示例电子邮件数据是:

Return-Path: <support.bpm@mydomain>
Delivered-To: faxhealthuat@mydomain.com
Received: from naplmailer2.com (unknown [172.25.3.5])
    by mail3.mydomain.com (Postfix) with ESMTP id 46E6572049B
    for <faxhealthuat@mydomain.com>; Tue, 23 Feb 2016 15:16:43 +0530 (IST)
DKIM-Signature: v=1; a=rsa-sha256; d=mydomain; s=sms2; c=relaxed/simple;
    q=dns/txt; i=@mydomain; t=1456220806; x=1458812806;
    h=From:Sender:Reply-To:Subject:Date:Message-ID:To:Cc:MIME-Version:Content-Type:
    Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From:
    Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id:
    List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
    bh=K7Tc1XHEFN5ey8WU6/HXHF9XYDMLCiIsVdU7DloptqI=;
    b=CEnhtyGSQi+08wghYzKjW61JpO/IqOCgjopdCaesEfRgdeu86BWTQ9ZV0G7mCkDz
    XChXBhzNsj+uST6yiu7ivYsCBqKvBAnyaoUvLSUw5rWAuCNlg1gdP1ilEzFnZZBB
    6U25CK64N81I5cKCdltgmUe5B97XueIV8M8LjhyemxM=;
X-AuditID: 7370fb5c-f79a16d000001484-b0-56cc2a86383c
Received: from CHNMURROOTCAS2.murugappa.com ( [172.25.1.14])
    by naplmailer2.com (Symantec Messaging Gateway) with SMTP id 8B.42.05252.68A2CC65; Tue, 23 Feb 2016 15:16:46 +0530 (IST)
Received: from CHNMURROOTMBX2.murugappa.com ([fe80::a141:6b81:60c9:125c]) by
 CHNMURROOTCAS2.murugappa.com ([fe80::fc6b:b33c:6d4f:fadd%12]) with mapi id
 14.03.0210.002; Tue, 23 Feb 2016 15:16:40 +0530
From: Support-BPM-CholaMS <support.bpm@mydomain>
To: "faxhealthuat@mydomain.com" <faxhealthuat@mydomain.com>
Subject: Test From Mail
Thread-Topic: Test From Mail
Thread-Index: AdFuHx8uv6VR8hDtQvKILSCahVrrMg==
Date: Tue, 23 Feb 2016 09:46:39 +0000
Message-ID: <B8C5C607CDD50E4D84DACA129D4CFD64C7299C49@CHNMURROOTMBX2.murugappa.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.111.10.60]
Content-Type: multipart/alternative;
    boundary="_000_B8C5C607CDD50E4D84DACA129D4CFD64C7299C49CHNMURROOTMBX2m_"
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprMKsWRmVeSWpSXmKPExsWyRpKRT7dN60yYwe2HihYvDps7MHqs73jD
    GsAY1cBok5iXl1+SWJKqkJJanGyr5JJZnJyTmJmbWqSQll+k4JyRn5Oo4BuspJCZYqtkqqRQ
    kJOYnJqbmldiq5RYUJCal6Jkx6WAAWyAyjLzFFLzkvNTMvPSbZU8g/11LSxMLXUNlexcPIOd
    fRw9fV2DFPz8E7ayZjx+spe54LdqxeLPS9kbGBcodzFyckgImEicOvSNFcIWk7hwbz1bFyMX
    h5DAdkaJdcd3QjmnGSU+z17PCFLFJmArseJgM5gtIuAocezPNxYQW1hAXGLdxFesEHEZieWH
    l0DZehLnzl5lA7FZBFQljhzoZQaxeQWCJW7seAZWwwi0+fupNUwgNjPQnFtP5jNBXCQgsWTP
    eWYIW1Ti5eN/UJcqSLR+PwUU5wCqz5fY8cEYYqSgxMmZT1gmMArNQjJpFkLVLCRVECU6Egt2
    f2KDsLUlli18zQxjnznwmAlZfAEj+ypG/rzEgpzcxMyc1CIjveT83E2MwJgvLvgds4Px00+n
    Q4wCHIxKPLzLG06HCbEmlhVX5h5ilOBgVhLhdeA7EybEm5JYWZValB9fVJqTWnyI0QcYIhOZ
    pUST84HpKK8k3tDI3MzQzMTY0NDc2BKHsJI4b6v84TAhgXRgaspOTS1ILYIZx8TBKdXAWDgr
    40nv+6kRyxcq/0qx//f+zokw3qrXR/M3XLflqeaaHnpi6YXDN39mzZhiMLv6DceSuWerT1xS
    SrXbcnaX/LOcj/pu9XFreqSf3lJ9lfYpY/3x2BW/+wofCb7749Fzfv3j/emHsy6/eO+X4LGs
    /4fGYpbrB0733TjNmyKzQWnjBP93PfbzFnEqsRRnJBpqMRcVJwIArc+Y8CYDAAA=

--_000_B8C5C607CDD50E4D84DACA129D4CFD64C7299C49CHNMURROOTMBX2m_
Content-Type: text/plain; charset="us-ascii"
content-transfer-encoding: quoted-printable

Testing for from mail fetch

--_000_B8C5C607CDD50E4D84DACA129D4CFD64C7299C49CHNMURROOTMBX2m_
Content-Type: text/html; charset="us-ascii"
content-transfer-encoding: quoted-printable
--_000_B8C5C607CDD50E4D84DACA129D4CFD64C7299C49CHNMURROOTMBX2m_--

假设您正在获取的字符串包含换行符

String rawEmail = "YOUR EMAIL CONTENTS";
String [] lines =  rawEmail.split("\r?\n");
Map<String, String> attributes = new HashMap<>();
for(String line : lines)
{
    String [] tokens = line.split(":");
    if(!tokens[0].isEmpty()) 
    {
        attributes.put(tokens[0].trim(), tokens[1].isEmpty()? null : tokens[1].trim());
    }
}

嵌套属性的进一步处理将以相同的方式完成

好吧,如果您想解析一封电子邮件,您只需要知道电子邮件的格式即可。这曾经在 RFC822 中定义,被 RFC2822 废弃,被 RFC5322 废弃。您应该先阅读这些文档,然后选择您希望能够处理其中的哪一部分。

在最高级别,消息由行组成。这些行应该以 \r\n (CrLf) 结尾,但你不应该依赖它,因为你在不知道是否有任何 pre-processing 发生的情况下从数据库获取消息。首先是 header(包含 header 行)和可选的 body 与 header 之间用空行分隔。

Header 行或 HEADER_NAME:HEADER_VALUE 形式,其中 header 名称不得以 space 开头。在 header 部分,任何以 space 开头的行都是续行,必须连接到前一行的值。

详情请参考RFC 5322

好吧,根据您的回答和评论做了一些研究后,我得到了我需要的东西。谢谢大家的努力。

只是在这里分享相同的内容。下面的 Java 方法将从数据库中获取电子邮件原始数据,找到电子邮件数据中包含的所有附件并将其保存到文件系统,最后 returns 要么成功要么失败消息。

public static String saveAttachments(String EMAIL_ID)
{
    try
    {
        String saveDirectory = "C:\Email\Attachements\";

        //Get email record from DB
        EMAIL newEMAILObj = EMAIL.getEmailDetailsForEmailId(EMAIL_ID);

        //Get email raw data into a String variable
        String emailRawData = newEMAILObj.getCONTENT();

        Session newSession = Session.getDefaultInstance(new Properties());
        InputStream inputStreamObj = new ByteArrayInputStream(emailRawData.getBytes());
        MimeMessage mimeMessageObj = new MimeMessage(newSession, inputStreamObj);
        String contentType = mimeMessageObj.getContentType();

        if (contentType.contains("multipart")) //Content may contain attachments
        {
            Multipart multiPart = (Multipart) mimeMessageObj.getContent();
            int numberOfParts = multiPart.getCount();
            for (int partCount = 0; partCount < numberOfParts; partCount++)
            {
                MimeBodyPart part = (MimeBodyPart) multiPart.getBodyPart(partCount);
                if (Part.ATTACHMENT.equalsIgnoreCase(part.getDisposition())) //This part is an attachment
                {
                    File file = new File(saveDirectory+part.getFileName());
                    part.saveFile(file);
                }
            }
        }
    }
    catch (MessagingException ex) 
    {
        return "FAILED: "+ex.getLocalizedMessage();
    }
    catch (IOException ex)
    {
        return "FAILED: "+ex.getLocalizedMessage();
    } 
    return "SUCCESS";
}