为什么 Python 的 MIMEMultipart 生成带有换行符的附件文件名？

Question

我正在发送一封带附件的电子邮件，该附件的文件名很长。为什么它会被换行符损坏，系统的哪一部分应该知道这些换行符应该被删除？

from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication
from email.utils import formatdate

msg = MIMEMultipart()
msg['Subject'] = 'subject'
msg['To'] = 'a@example.com'
msg['From'] = 'b@example.com'
msg['Date'] = formatdate(localtime=True)
msg.attach(MIMEText('abc'))

attachment_name = 'abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz.txt'
part = MIMEApplication("sometext", Name=attachment_name)
part['Content-Disposition'] = 'attachment; filename="%s"' % attachment_name
msg.attach(part)

print msg.as_string()

给我：

Content-Type: multipart/mixed; boundary="===============1448866158=="
MIME-Version: 1.0
Subject: subject
To: a@example.com
From: b@example.com
Date: Sat, 20 Jan 2018 13:11:42 -0500

--===============1448866158==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

abc
--===============1448866158==
Content-Type: application/octet-stream;
 Name="abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz
 abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz
 abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz.txt"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="abcdefghijklmnopqrstuvwxyz
 abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz
 abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz
 abcdefghijklmnopqrstuvwxyz.txt"

c29tZXRleHQ=
--===============1448866158==--

Answer 1

长 header 字段的处理在 section 2.2.3 of RFC 2822 "Internet Message Format". That section survives unchanged in an obsoleting RFC 5322 中定义。

2.2.3. Long Header Fields

Each header field is logically a single line of characters comprising the field name, the colon, and the field body. For convenience however, and to deal with the 998/78 character limitations per line, the field body portion of a header field can be split into a multiple line representation; this is called "folding". The general rule is that wherever this standard allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP. For example, the header field:
Subject: This is a test
can be represented as:
Subject: This
 is a test
Note: Though structured field bodies are defined in such a way that folding can take place between many of the lexical tokens (and even within some of the lexical tokens), folding SHOULD be limited to placing the CRLF at higher-level syntactic breaks. For instance, if a field body is defined as comma-separated values, it is recommended that folding occur after the comma separating the structured items in preference to other places where the field could be folded, even if it is allowed elsewhere.

The process of moving from this folded multiple-line representation of a header field to its single line representation is called "unfolding". Unfolding is accomplished by simply removing any CRLF that is immediately followed by WSP. Each header field should be treated in its unfolded form for further syntactic and semantic evaluation.

Answer 2

正如 Leon 的回答所解释的那样，Python 正在实施 RFC 中定义的折叠算法。

在Python 2中，您可以使用email.generator.Generator实例来控制最大header长度；来自 the docs:

For more flexibility, instantiate a Generator instance and use its flatten() method directly. For example:

from cStringIO import StringIO
from email.generator import Generator
fp = StringIO()
g = Generator(fp, mangle_from_=False, maxheaderlen=60)
g.flatten(msg)
text = fp.getvalue()

（将 maxheaderlen 设置为零将在几乎所有情况下防止折叠长 header 行）。

在Python 3.5中，maxheaderlen参数暴露在email.message.Message.as_string的signature中，所以

print(msg.as_string(maxheaderlen=256))

是可以的。 maxheaderlen 默认为零，因此 header 行不会换行，除非提供值。

在 Python 3.6 中，maxheaderlen 暴露在 email.message.EmailMessage.as_string 的 signature 中（注意这是一个不同的 class）。 maxheaderlen 现在默认为 None：除非指定值，否则 header 行以 78 个字符换行。

为什么 Python 的 MIMEMultipart 生成带有换行符的附件文件名？

Why does Python's MIMEMultipart generate attachment filenames with newlines?

python

mime