这些是 MultiByteToWideChar() 和 WideCharToMultiByte() 的空终止符规则吗？不太懂MSDN

Question

我正在尝试确保我的 UTF-8 和 UTF-16 之间的转换代码在空终止符方面是正确的。

在 MultiByteToWideChar() 的情况下，我了解到如果您传递大小为 0 的输出缓冲区，您将获得包括终止空值的字符数。我的问题是：您是否将计数 including 终止空值作为新的缓冲区大小传递，并与计数 including 终止空值进行比较？或者换句话说，这是正确的吗？

n = MultiByteToWideChar(..., NULL, 0);
if (MultiByteToWideChar(..., buf, n) != n) error();

我是根据输入缓冲区大小从广告中猜测的

If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.

输入缓冲区大小为-1，则答案为是；是这样吗？

对于 WideCharToMultiByte()，我完全不确定空终止符。如果我为输出缓冲区计数传递 0，returned 计数是否包含空终止符？对于实际转换，我说输出缓冲区的大小是否包含空终止符？ return 值是否包含空终止符？

我当前的代码用否、否和否（分别）回答这些问题。这似乎可行，但我宁愿不相信偶然工作的代码。我唯一的提示是以下简介：

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.

所以我认为答案确实是，是，是，但我仍然不完全确定。

谢谢。

为了更好的衡量，这是我的代码：

// note: assume logLastError() calls DebugBreak() and that uiAlloc() aborts on failure

#define MBTWC(str, wstr, bufsiz) MultiByteToWideChar(CP_UTF8, 0, str, -1, wstr, bufsiz)

WCHAR *toUTF16(const char *str)
{
    WCHAR *wstr;
    int n;

    n = MBTWC(str, NULL, 0);
    if (n == 0)
        logLastError("error figuring out number of characters to convert to in toUTF16()");
    wstr = (WCHAR *) uiAlloc(n * sizeof (WCHAR), "WCHAR[]");
    // TODO verify return includes null terminator
    if (MBTWC(str, wstr, n) != n)
        logLastError("error converting from UTF-8 to UTF-16 in toUTF16()");
    return wstr;
}

#define WCTMB(wstr, str, bufsiz) WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, bufsiz, NULL, FALSE)

char *toUTF8(const WCHAR *wstr)
{
    char *str;
    int n;

    n = WCTMB(wstr, NULL, 0);
    if (n == 0)
        logLastError("error figuring out number of characters to convert to in toUTF8()");
    // TODO does n include the null terminator?
    str = (char *) uiAlloc((n + 1) * sizeof (char), "char[]");
    if (WCTMB(wstr, str, n + 1) != n)
        logLastError("error converting from UTF-16 to UTF-8 in toUTFF8()");
    return str;
}

Answer 1

MultiByteToWideChar 的 return 值的文档说：

If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr.

所以，对于你的问题。

If I pass 0 for the output buffer count, will the returned count include null terminators or not?

是的，如果您将 -1 传递给 cbMultiByte。不，如果你通过了 strlen(lpMultiByteStr).

For the actual conversion, do I say the output buffer's size includes the null terminator or not?

如果您希望缓冲区以 null 结尾，则可以，如果不需要，则否。

所以，完成后：

n = MultiByteToWideChar(..., -1, NULL, 0);

如果您希望缓冲区以空值终止，您可以选择分配长度为 n 的缓冲区，或者如果您不希望缓冲区以空值终止，则可以选择分配长度为 n-1 的缓冲区。显然，您需要将 n 或 n-1 作为 cchWideChar 参数传递，以匹配缓冲区的实际长度。

查看您的代码，很明显您想要创建以 null 结尾的缓冲区。您的 toUTF16 代码是正确的。您的 toUTF8 代码不是。您应该使用与 toUTF16 中相同的长度处理代码。更重要的是， WideCharToMultiByte 的最终参数有点不精确。它是一个指向布尔值的指针。代码应该是：

#define WCTMB(wstr, str, bufsiz) WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, bufsiz, NULL, NULL)

char *toUTF8(const WCHAR *wstr)
{
    char *str;
    int n;

    n = WCTMB(wstr, NULL, 0);
    if (n == 0)
        logLastError("error figuring out number of characters to convert to in toUTF8()");
    str = (char *) uiAlloc(n * sizeof (char), "char[]");
    if (WCTMB(wstr, str, n) != n)
        logLastError("error converting from UTF-16 to UTF-8 in toUTFF8()");
    return str;
}

这些是 MultiByteToWideChar() 和 WideCharToMultiByte() 的空终止符规则吗？不太懂MSDN

Are these the null-terminator rules for MultiByteToWideChar() and WideCharToMultiByte()? I don't quite understand MSDN

winapi

encoding