这些是 MultiByteToWideChar() 和 WideCharToMultiByte() 的空终止符规则吗?不太懂MSDN

Are these the null-terminator rules for MultiByteToWideChar() and WideCharToMultiByte()? I don't quite understand MSDN

我正在尝试确保我的 UTF-8 和 UTF-16 之间的转换代码在空终止符方面是正确的。

MultiByteToWideChar() 的情况下,我了解到如果您传递大小为 0 的输出缓冲区,您将获得 包括 终止空值的字符数。我的问题是:您是否将计数 including 终止空值作为新的缓冲区大小传递,并与计数 including 终止空值进行比较?或者换句话说,这是正确的吗?

n = MultiByteToWideChar(..., NULL, 0);
if (MultiByteToWideChar(..., buf, n) != n) error();

我是根据输入缓冲区大小从广告中猜测的

If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.

输入缓冲区大小为-1,则答案为是;是这样吗?

对于 WideCharToMultiByte(),我完全不确定空终止符。如果我为输出缓冲区计数传递 0,returned 计数是否包含空终止符?对于实际转换,我说输出缓冲区的大小是否包含空终止符? return 值是否包含空终止符?

我当前的代码用否、否和否(分别)回答这些问题。这似乎可行,但我宁愿不相信偶然工作的代码。我唯一的提示是以下简介:

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.

所以我认为答案确实是,是,是,但我仍然不完全确定。

谢谢。

为了更好的衡量,这是我的代码:

// note: assume logLastError() calls DebugBreak() and that uiAlloc() aborts on failure

#define MBTWC(str, wstr, bufsiz) MultiByteToWideChar(CP_UTF8, 0, str, -1, wstr, bufsiz)

WCHAR *toUTF16(const char *str)
{
    WCHAR *wstr;
    int n;

    n = MBTWC(str, NULL, 0);
    if (n == 0)
        logLastError("error figuring out number of characters to convert to in toUTF16()");
    wstr = (WCHAR *) uiAlloc(n * sizeof (WCHAR), "WCHAR[]");
    // TODO verify return includes null terminator
    if (MBTWC(str, wstr, n) != n)
        logLastError("error converting from UTF-8 to UTF-16 in toUTF16()");
    return wstr;
}

#define WCTMB(wstr, str, bufsiz) WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, bufsiz, NULL, FALSE)

char *toUTF8(const WCHAR *wstr)
{
    char *str;
    int n;

    n = WCTMB(wstr, NULL, 0);
    if (n == 0)
        logLastError("error figuring out number of characters to convert to in toUTF8()");
    // TODO does n include the null terminator?
    str = (char *) uiAlloc((n + 1) * sizeof (char), "char[]");
    if (WCTMB(wstr, str, n + 1) != n)
        logLastError("error converting from UTF-16 to UTF-8 in toUTFF8()");
    return str;
}

MultiByteToWideChar 的 return 值的文档说:

If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr.

所以,对于你的问题。

If I pass 0 for the output buffer count, will the returned count include null terminators or not?

是的,如果您将 -1 传递给 cbMultiByte。不,如果你通过了 strlen(lpMultiByteStr).

For the actual conversion, do I say the output buffer's size includes the null terminator or not?

如果您希望缓冲区以 null 结尾,则可以,如果不需要,则否。


所以,完成后:

n = MultiByteToWideChar(..., -1, NULL, 0);

如果您希望缓冲区以空值终止,您可以选择分配长度为 n 的缓冲区,或者如果您不希望缓冲区以空值终止,则可以选择分配长度为 n-1 的缓冲区。显然,您需要将 nn-1 作为 cchWideChar 参数传递,以匹配缓冲区的实际长度。


查看您的代码,很明显您想要创建以 null 结尾的缓冲区。您的 toUTF16 代码是正确的。您的 toUTF8 代码不是。您应该使用与 toUTF16 中相同的长度处理代码。更重要的是, WideCharToMultiByte 的最终参数有点不精确。它是一个指向布尔值的指针。代码应该是:

#define WCTMB(wstr, str, bufsiz) WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, bufsiz, NULL, NULL)

char *toUTF8(const WCHAR *wstr)
{
    char *str;
    int n;

    n = WCTMB(wstr, NULL, 0);
    if (n == 0)
        logLastError("error figuring out number of characters to convert to in toUTF8()");
    str = (char *) uiAlloc(n * sizeof (char), "char[]");
    if (WCTMB(wstr, str, n) != n)
        logLastError("error converting from UTF-16 to UTF-8 in toUTFF8()");
    return str;
}