从 .eml 文件获取文本的最佳方法是什么?
What is the best way to get text from .eml file?
我尝试从本地驱动器上的几个 eml 文件获取主题和消息正文。现在我尝试使用 Apache Commons Email,但有时它会无错误地循环。这是我的代码,它应该从 eml 中获取文本并将其保存到 txt:
MimeMessage mimeMessage = MimeMessageUtils.createMimeMessage(null, file);
MimeMessageParser parser = new MimeMessageParser(mimeMessage);
if (parser.parse().hasPlainContent()) {
//Trying to get text of the message
try (FileWriter writer = new FileWriter(txtName)) {
writeHeaders(writer, parser);
writer.write(parser.parse().getPlainContent());
} catch (IOException e) {
e.printStackTrace();
}
} else if (parser.parse().hasHtmlContent()) {
try (FileWriter writer = new FileWriter(txtName)) {
writeHeaders(writer, parser);
String text = Jsoup.parse(parser.parse().getHtmlContent()).text();
writer.write(text);
} catch (IOException e) {
e.printStackTrace();
}
}
这里还有 writeHeaders 方法:
private void writeHeaders(FileWriter writer, MimeMessageParser parser) throws Exception {
writer.write("From :" + parser.getFrom() + "\n");
writer.write("To:" + parser.getTo() + "\n");
writer.write("Subject:" + parser.getSubject() + "\n");
writer.write("Message:" + "\n" + "\n");
}
获取附件的方法如下:
if (parser.parse().hasAttachments()) {
//Getting and saving attachments from eml
List<DataSource> attachments = parser.parse().getAttachmentList();
for (DataSource attachment : attachments) {
if (attachment.getName() != null && !attachment.getName().isEmpty()) {
try (InputStream is = attachment.getInputStream()) {
File save = new File(saveDir + File.separator + attachment.getName());
FileOutputStream fos = new FileOutputStream(save);
byte[] buf = new byte[4096];
int bytesRead;
while ((bytesRead = is.read(buf)) != -1) {
fos.write(buf, 0, bytesRead);
}
fos.close();
if (save.getName().endsWith("eml")) {
parseEml(save, count);
}
} catch (Exception e) {
e.printStackTrace();
}
那么,也许有更简单的方法来获取文本和附件?
是的,容易多了。 Simple Java Mail (Github) can read .eml files 并使内容易于访问。如果您在那里也发现类似循环错误的问题(不太可能),我很乐意在那里为您提供帮助(我积极维护简单 Java 邮件):
Email email = EmailConverter.emlToEmail(emlFile);
email.getFromRecipient();
email.getSubject();
email.getPlainText();
email.getHTMLText();
email.getAttachments();
email.getEmbeddedImages();
email.getHeaders();
// etc. etc.
还支持 S/MIME 加密电子邮件(如果您有解密电子邮件所需的证书)。
我尝试从本地驱动器上的几个 eml 文件获取主题和消息正文。现在我尝试使用 Apache Commons Email,但有时它会无错误地循环。这是我的代码,它应该从 eml 中获取文本并将其保存到 txt:
MimeMessage mimeMessage = MimeMessageUtils.createMimeMessage(null, file);
MimeMessageParser parser = new MimeMessageParser(mimeMessage);
if (parser.parse().hasPlainContent()) {
//Trying to get text of the message
try (FileWriter writer = new FileWriter(txtName)) {
writeHeaders(writer, parser);
writer.write(parser.parse().getPlainContent());
} catch (IOException e) {
e.printStackTrace();
}
} else if (parser.parse().hasHtmlContent()) {
try (FileWriter writer = new FileWriter(txtName)) {
writeHeaders(writer, parser);
String text = Jsoup.parse(parser.parse().getHtmlContent()).text();
writer.write(text);
} catch (IOException e) {
e.printStackTrace();
}
}
这里还有 writeHeaders 方法:
private void writeHeaders(FileWriter writer, MimeMessageParser parser) throws Exception {
writer.write("From :" + parser.getFrom() + "\n");
writer.write("To:" + parser.getTo() + "\n");
writer.write("Subject:" + parser.getSubject() + "\n");
writer.write("Message:" + "\n" + "\n");
}
获取附件的方法如下:
if (parser.parse().hasAttachments()) {
//Getting and saving attachments from eml
List<DataSource> attachments = parser.parse().getAttachmentList();
for (DataSource attachment : attachments) {
if (attachment.getName() != null && !attachment.getName().isEmpty()) {
try (InputStream is = attachment.getInputStream()) {
File save = new File(saveDir + File.separator + attachment.getName());
FileOutputStream fos = new FileOutputStream(save);
byte[] buf = new byte[4096];
int bytesRead;
while ((bytesRead = is.read(buf)) != -1) {
fos.write(buf, 0, bytesRead);
}
fos.close();
if (save.getName().endsWith("eml")) {
parseEml(save, count);
}
} catch (Exception e) {
e.printStackTrace();
}
那么,也许有更简单的方法来获取文本和附件?
是的,容易多了。 Simple Java Mail (Github) can read .eml files 并使内容易于访问。如果您在那里也发现类似循环错误的问题(不太可能),我很乐意在那里为您提供帮助(我积极维护简单 Java 邮件):
Email email = EmailConverter.emlToEmail(emlFile);
email.getFromRecipient();
email.getSubject();
email.getPlainText();
email.getHTMLText();
email.getAttachments();
email.getEmbeddedImages();
email.getHeaders();
// etc. etc.
还支持 S/MIME 加密电子邮件(如果您有解密电子邮件所需的证书)。