Delphi 6 可以将 UTF-8 葡萄牙语转换为 WideString 吗？

Question

我正在使用 Delphi 6.

我想将葡萄牙语 UTF-8 编码字符串解码为 WideString，但我发现解码不正确。

原文为"ANÁLISE8"。使用UTF8Decode()后，结果为"ANALISE8"。 "A" 顶部的符号消失。

代码如下：

var
  f : textfile;
  s : UTF8String;
  w, test : WideString;    
begin
  while not eof(f) do
  begin
    readln(f,s);
    w := UTF8Decode(s);

如何将葡萄牙语 UTF-8 字符串正确解码为 WideString？

Answer 1

请注意 Delphi 6 中 UTF8Decode() 的实现不完整。具体来说，它不支持编码的 4 字节序列，这是处理 U+FFFF 以上的 Unicode 代码点所必需的。这意味着 UTF8Decode() 只能解码 UCS-2 范围内的 Unicode 代码点，而不是完整的 Unicode 曲目。因此 UTF8Decode() 在 Delphi 6 中基本上没用（一直到 Delphi 2007 - 它最终在 Delphi 2009 中得到修复）。

尝试使用 Win32 MultiByteToWideChar() 函数，例如：

uses
  ..., Windows;

function MyUTF8Decode(const s: UTF8String): WideString;
var
  Len: Integer;
begin
  Len := MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(s), Length(s), nil, 0);
  SetLength(Result, Len);
  if Len > 0 then
    MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(s), Length(s), PWideChar(Result), Len));
end;

var
  f : textfile;
  s : UTF8String;
  w, test : WideString;
begin
  while not eof(f) do
  begin
    readln(f,s);
    w := MyUTF8Decode(s);

也就是说，您的 ANÁLISE8 字符串属于 UCS-2 范围，所以我在 Delphi 6 中测试了 UTF8Decode() 并且它解码了 [的 UTF-8 编码形式=16=]就好了。我的结论是：

您的 UTF8String 变量不包含开头的 ANÁLISE8 的 UTF-8 编码形式（字节序列 41 4E C3 81 4C 49 53 45 38），而是包含 ASCII string ANALISE8 而不是（字节序列 41 4E 41 4C 49 53 45 38），它将按原样解码，因为 ASCII 是 UTF-8 的子集。仔细检查您的文件，以及 Readln().
您的 WideString 按预期正确包含 ANÁLISE8，但是您 outputting/debugging 的方式（您没有显示）正在将其转换为 ANSI，在转换过程中丢失 Á。

Delphi 6 可以将 UTF-8 葡萄牙语转换为 WideString 吗？

Can Delphi 6 convert UTF-8 Portuguese to WideString?

delphi

unicode

utf-8

delphi-6