在 Indy HTTP 服务器上处理多部分请求时出现编码问题

Question

我有一个基于 TIdHTTPServer 的网络服务器。它建于 Delphi 悉尼。从我收到以下 multipart/form-data post 流的网页：

-----------------------------16857441221270830881532229640 
Content-Disposition: form-data; name="d"

83AAAFUaVVs4Q07z
-----------------------------16857441221270830881532229640 
Content-Disposition: form-data; name="dir"

Upload
-----------------------------16857441221270830881532229640 
Content-Disposition: form-data; name="file_name"; filename="ÄŤeskĂˇ teÄŤka.png"
Content-Type: image/png

PNG_DATA    
-----------------------------16857441221270830881532229640--

问题是没有正确接收文本部分。我阅读了 Indy MIME decoding of Multipart/Form-Data Requests returns trailing CR/LF 并将传输编码更改为 8 位，这有助于正确接收文件，但接收到的文件名仍然错误（目录应为 Upload，文件名应为 česká tečka.png）。

d=83AAAFUaVVs4Q07z
dir=UploadW
??esk?? te??ka.png 75

为了演示这个问题，我将我的代码简化为一个控制台应用程序（请注意 MIME.txt 文件包含与上面 post 流中相同的内容）：

program MIMEMultiPartTest;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.Classes, System.SysUtils,
  IdGlobal, IdCoder, IdMessage, IdMessageCoder, IdGlobalProtocols, IdCoderMIME, IdMessageCoderMIME,
  IdCoderQuotedPrintable, IdCoderBinHex4;


procedure ProcessAttachmentPart(var Decoder: TIdMessageDecoder; var MsgEnd: Boolean);
var
  MS: TMemoryStream;
  Name: string;
  Value: string;
  NewDecoder: TIdMessageDecoder;
begin
  MS := TMemoryStream.Create;
  try
    // 
    TIdMessageDecoderMIME(Decoder).Headers.Values['Content-Transfer-Encoding'] := '8bit';
    TIdMessageDecoderMIME(Decoder).BodyEncoded := False;
    NewDecoder := Decoder.ReadBody(MS, MsgEnd);
    MS.Position := 0; // nutne?
    if Decoder.Filename <> EmptyStr then // je to atachment
    begin
      try
        Writeln(Decoder.Filename + ' ' + IntToStr(MS.Size));
      except
        FreeAndNil(NewDecoder);
        Writeln('Error processing MIME');
      end;
    end
    else // je to parametr
    begin
      Name := ExtractHeaderSubItem(Decoder.Headers.Text, 'name', QuoteHTTP);
      if Name <> EmptyStr then
      begin
        Value := string(PAnsiChar(MS.Memory));
        try
          Writeln(Name + '=' + Value);
        except
          FreeAndNil(NewDecoder);
        Writeln('Error processing MIME');
        end;
      end;
    end;
    Decoder.Free;
    Decoder := NewDecoder;
  finally
    MS.Free;
  end;
end;

function ProcessMultiPart(const ContentType: string; Stream: TStream): Boolean;
var
  Boundary: string;
  BoundaryStart: string;
  BoundaryEnd: string;
  Decoder: TIdMessageDecoder;
  Line: string;
  BoundaryFound: Boolean;
  IsStartBoundary: Boolean;
  MsgEnd: Boolean;
begin
  Result := False;
  Boundary := ExtractHeaderSubItem('multipart/form-data; boundary=---------------------------16857441221270830881532229640', 'boundary', QuoteHTTP);
  if Boundary <> EmptyStr then
  begin
    BoundaryStart := '--' + Boundary;
    BoundaryEnd := BoundaryStart + '--';
    Decoder := TIdMessageDecoderMIME.Create(nil);
    try
      TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
      Decoder.SourceStream := Stream;
      Decoder.FreeSourceStream := False;
      BoundaryFound := False;
      IsStartBoundary := False;
      repeat
        Line := ReadLnFromStream(Stream, -1, True);
        if Line = BoundaryStart then
        begin
          BoundaryFound := True;
          IsStartBoundary := True;
        end
        else
        begin
          if Line = BoundaryEnd then
            BoundaryFound := True;
        end;
      until BoundaryFound;
      if BoundaryFound and IsStartBoundary then
      begin
        MsgEnd := False;
        repeat
          TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
          Decoder.SourceStream := Stream;
          Decoder.FreeSourceStream := False;
          Decoder.ReadHeader;
          case Decoder.PartType of
            mcptText,
            mcptAttachment:
              begin
                ProcessAttachmentPart(Decoder, MsgEnd);
              end;
            mcptIgnore:
              begin
                Decoder.Free;
                Decoder := TIdMessageDecoderMIME.Create(nil);
              end;
            mcptEOF:
              begin
                Decoder.Free;
                MsgEnd := True;
              end;
          end;
        until (Decoder = nil) or MsgEnd;
        Result := True;
      end
    finally
      Decoder.Free;
    end;
  end;
end;

var
  Stream: TMemoryStream;
begin
  Stream := TMemoryStream.Create;
  try
    Stream.LoadFromFile('MIME.txt');
    ProcessMultiPart('multipart/form-data; boundary=---------------------------16857441221270830881532229640', Stream);
  finally
    Stream.Free;
  end;
  Readln;
end.

有人能帮我看看我的代码有什么问题吗？谢谢。

Answer 1

你在ProcessMultiPart()中调用ExtractHeaderSubItem()是错误的，它需要传入ContentType字符串参数，而不是hard-coded字符串文字。

你在ProcessAttachmentPart()中调用ExtractHeaderSubItem()也是错误的，只需要传入Content-Dispositionheader的内容，而不是整个Headers.Text。 ExtractHeaderSubItem() 被设计为一次只能对 1 header 进行操作。

关于 dir MIME 部分，body 数据以 'UploadW' 而不是 'Upload' 结束的原因是因为您没有使用 MS.Size将 MS.Memory 分配给 Value 字符串时考虑在内。 TMemoryStream 数据是不是 null-terminated！因此，您需要使用 SetString() 而不是 := 运算符，例如：

var
  Value: AnsiString;
...
SetString(Value, PAnsiChar(MS.Memory), MS.Size);

关于 Decoder.FileName，该值完全不受 Content-Transfer-Encoding header 的影响。 MIME headers 根本不允许 未编码 Unicode 字符。目前，Indy 的 MIME 解码器支持 RFC2047-style encodings for Unicode characters in headers, per RFC 7578 Section 5.1.3，但您的流数据未使用该格式。它 看起来 就像你的数据使用原始 UTF-8 八位字节 ¹ （5.1.3 也提到了一种可能的编码，但解码器没有目前正在寻找）。因此，您可能必须根据需要自己手动提取和解码原始 filename。如果您知道 filename 将始终编码为 UTF-8，您可以尝试将 Indy 的全局 IdGlobal.GIdDefaultTextEncoding 变量设置为 encUTF8（默认为encASCII)，然后 Decoder.FileName 应该是准确的。但是，这是一个全局设置，因此可能在 Indy 的其他地方产生不需要的副作用，具体取决于上下文和数据。因此，我建议改为将 GIdDefaultTextEncoding 设置为 enc8Bit，以便将不需要的副作用降至最低，并且 Decoder.FileName 将包含原始原始字节 as-is（仅扩展到 16 - 位字符）。这样，您可以通过简单地将 Decoder.FileName as-is 传递给 IndyTextEncoding_8Bit.GetBytes() 来恢复原始 filename 字节，然后根据需要对其进行解码（例如使用 IndyTextEncoding_UTF8.GetString() ，在验证字节是有效的 UTF-8 后）。

^{1: 但是，ÄŤeskĂˇ teÄŤka.png 不是 česká tečka.png 的正确 UTF-8 格式，看起来数据可能已经double-encoded，即česká tečka.png被UTF-8编码，然后得到的字节再次被UTF-8编码}

Answer 2

现在只应添加 filename 参数作为备用原因，而应添加 filename* 以清楚地告诉文件名具有哪种文本编码。否则每个客户都只是猜测和假设。哪个可能出错。

RFC 5987 §3.2 定义 filename* 参数的格式：

charset ' [ language ] ' value-chars

...鉴于：

charset can be UTF-8 or ISO-8859-1 or any MIME-charset

...并且语言是可选的。

RFC 6266 §4.3 defines that filename* should be used and comes up with examples in §5:

Content-Disposition: attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates`

你看到星号 * 了吗？你发现文本编码 utf-8 了吗？您是否发现两个撇号 ''，没有指定进一步指定的语言（请参阅 RFC 5646 § 2.1）？然后根据指定的文本编码得出八位字节：百分比编码，或（如果允许）纯 ASCII。

其他示例：

```
Content-Disposition: attachment; filename="green.jpg"; filename*=UTF-8''%e3%82%b0%e3%83%aa%e3%83%bc%e3%83%b3.jpg
```
将在旧版网络浏览器上显示“green.jpg”，在兼容的网络浏览器上显示“グリーン.jpg”。
```
Content-Disposition: attachment; filename="Gruesse.txt"; filename*=ISO-8859-1''Gr%fc%dfe.txt
```
将在旧版网络浏览器上显示“Gruesse.txt”，在兼容网络上显示“Grüße.txt”浏览器。
```
Content-Disposition: attachment; filename="Hello.png"; filename*=Shift_JIS'en-US'Howdy.png; filename*=EUC-KR'de'Hallo.png
```
将在旧版网络浏览器上显示“Hello.png”，在兼容网络上显示“Howdy.png”首选语言设置为美国英语的浏览器，以及“Hallo.png”在首选语言为德语（德语）的兼容浏览器上。请注意，只要八位字节在允许的范围内（拉丁字母以及点），不同的文本编码就不受百分比编码的约束。

根据我的经验，没有人关心这个不错的功能 - 每个人都只是将 UTF-8 推入 filename，这仍然违反标准 - 无论有多少客户默默支持它。链接 and PHP: RFC-2231 How to encode UTF-8 String as Content-Disposition filename.

在 Indy HTTP 服务器上处理多部分请求时出现编码问题

Encoding problem while processing a multipart request on Indy HTTP server

delphi

mime

multipartform-data

indy