这些是 MultiByteToWideChar() 和 WideCharToMultiByte() 的空终止符规则吗?不太懂MSDN
Are these the null-terminator rules for MultiByteToWideChar() and WideCharToMultiByte()? I don't quite understand MSDN
我正在尝试确保我的 UTF-8 和 UTF-16 之间的转换代码在空终止符方面是正确的。
在 MultiByteToWideChar()
的情况下,我了解到如果您传递大小为 0 的输出缓冲区,您将获得 包括 终止空值的字符数。我的问题是:您是否将计数 including 终止空值作为新的缓冲区大小传递,并与计数 including 终止空值进行比较?或者换句话说,这是正确的吗?
n = MultiByteToWideChar(..., NULL, 0);
if (MultiByteToWideChar(..., buf, n) != n) error();
我是根据输入缓冲区大小从广告中猜测的
If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.
输入缓冲区大小为-1,则答案为是;是这样吗?
对于 WideCharToMultiByte()
,我完全不确定空终止符。如果我为输出缓冲区计数传递 0,returned 计数是否包含空终止符?对于实际转换,我说输出缓冲区的大小是否包含空终止符? return 值是否包含空终止符?
我当前的代码用否、否和否(分别)回答这些问题。这似乎可行,但我宁愿不相信偶然工作的代码。我唯一的提示是以下简介:
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.
所以我认为答案确实是,是,是,但我仍然不完全确定。
谢谢。
为了更好的衡量,这是我的代码:
// note: assume logLastError() calls DebugBreak() and that uiAlloc() aborts on failure
#define MBTWC(str, wstr, bufsiz) MultiByteToWideChar(CP_UTF8, 0, str, -1, wstr, bufsiz)
WCHAR *toUTF16(const char *str)
{
WCHAR *wstr;
int n;
n = MBTWC(str, NULL, 0);
if (n == 0)
logLastError("error figuring out number of characters to convert to in toUTF16()");
wstr = (WCHAR *) uiAlloc(n * sizeof (WCHAR), "WCHAR[]");
// TODO verify return includes null terminator
if (MBTWC(str, wstr, n) != n)
logLastError("error converting from UTF-8 to UTF-16 in toUTF16()");
return wstr;
}
#define WCTMB(wstr, str, bufsiz) WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, bufsiz, NULL, FALSE)
char *toUTF8(const WCHAR *wstr)
{
char *str;
int n;
n = WCTMB(wstr, NULL, 0);
if (n == 0)
logLastError("error figuring out number of characters to convert to in toUTF8()");
// TODO does n include the null terminator?
str = (char *) uiAlloc((n + 1) * sizeof (char), "char[]");
if (WCTMB(wstr, str, n + 1) != n)
logLastError("error converting from UTF-16 to UTF-8 in toUTFF8()");
return str;
}
MultiByteToWideChar
的 return 值的文档说:
If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr.
所以,对于你的问题。
If I pass 0 for the output buffer count, will the returned count include null terminators or not?
是的,如果您将 -1
传递给 cbMultiByte
。不,如果你通过了 strlen(lpMultiByteStr)
.
For the actual conversion, do I say the output buffer's size includes the null terminator or not?
如果您希望缓冲区以 null 结尾,则可以,如果不需要,则否。
所以,完成后:
n = MultiByteToWideChar(..., -1, NULL, 0);
如果您希望缓冲区以空值终止,您可以选择分配长度为 n
的缓冲区,或者如果您不希望缓冲区以空值终止,则可以选择分配长度为 n-1
的缓冲区。显然,您需要将 n
或 n-1
作为 cchWideChar
参数传递,以匹配缓冲区的实际长度。
查看您的代码,很明显您想要创建以 null 结尾的缓冲区。您的 toUTF16
代码是正确的。您的 toUTF8
代码不是。您应该使用与 toUTF16
中相同的长度处理代码。更重要的是, WideCharToMultiByte
的最终参数有点不精确。它是一个指向布尔值的指针。代码应该是:
#define WCTMB(wstr, str, bufsiz) WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, bufsiz, NULL, NULL)
char *toUTF8(const WCHAR *wstr)
{
char *str;
int n;
n = WCTMB(wstr, NULL, 0);
if (n == 0)
logLastError("error figuring out number of characters to convert to in toUTF8()");
str = (char *) uiAlloc(n * sizeof (char), "char[]");
if (WCTMB(wstr, str, n) != n)
logLastError("error converting from UTF-16 to UTF-8 in toUTFF8()");
return str;
}
我正在尝试确保我的 UTF-8 和 UTF-16 之间的转换代码在空终止符方面是正确的。
在 MultiByteToWideChar()
的情况下,我了解到如果您传递大小为 0 的输出缓冲区,您将获得 包括 终止空值的字符数。我的问题是:您是否将计数 including 终止空值作为新的缓冲区大小传递,并与计数 including 终止空值进行比较?或者换句话说,这是正确的吗?
n = MultiByteToWideChar(..., NULL, 0);
if (MultiByteToWideChar(..., buf, n) != n) error();
我是根据输入缓冲区大小从广告中猜测的
If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.
输入缓冲区大小为-1,则答案为是;是这样吗?
对于 WideCharToMultiByte()
,我完全不确定空终止符。如果我为输出缓冲区计数传递 0,returned 计数是否包含空终止符?对于实际转换,我说输出缓冲区的大小是否包含空终止符? return 值是否包含空终止符?
我当前的代码用否、否和否(分别)回答这些问题。这似乎可行,但我宁愿不相信偶然工作的代码。我唯一的提示是以下简介:
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.
所以我认为答案确实是,是,是,但我仍然不完全确定。
谢谢。
为了更好的衡量,这是我的代码:
// note: assume logLastError() calls DebugBreak() and that uiAlloc() aborts on failure
#define MBTWC(str, wstr, bufsiz) MultiByteToWideChar(CP_UTF8, 0, str, -1, wstr, bufsiz)
WCHAR *toUTF16(const char *str)
{
WCHAR *wstr;
int n;
n = MBTWC(str, NULL, 0);
if (n == 0)
logLastError("error figuring out number of characters to convert to in toUTF16()");
wstr = (WCHAR *) uiAlloc(n * sizeof (WCHAR), "WCHAR[]");
// TODO verify return includes null terminator
if (MBTWC(str, wstr, n) != n)
logLastError("error converting from UTF-8 to UTF-16 in toUTF16()");
return wstr;
}
#define WCTMB(wstr, str, bufsiz) WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, bufsiz, NULL, FALSE)
char *toUTF8(const WCHAR *wstr)
{
char *str;
int n;
n = WCTMB(wstr, NULL, 0);
if (n == 0)
logLastError("error figuring out number of characters to convert to in toUTF8()");
// TODO does n include the null terminator?
str = (char *) uiAlloc((n + 1) * sizeof (char), "char[]");
if (WCTMB(wstr, str, n + 1) != n)
logLastError("error converting from UTF-16 to UTF-8 in toUTFF8()");
return str;
}
MultiByteToWideChar
的 return 值的文档说:
If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr.
所以,对于你的问题。
If I pass 0 for the output buffer count, will the returned count include null terminators or not?
是的,如果您将 -1
传递给 cbMultiByte
。不,如果你通过了 strlen(lpMultiByteStr)
.
For the actual conversion, do I say the output buffer's size includes the null terminator or not?
如果您希望缓冲区以 null 结尾,则可以,如果不需要,则否。
所以,完成后:
n = MultiByteToWideChar(..., -1, NULL, 0);
如果您希望缓冲区以空值终止,您可以选择分配长度为 n
的缓冲区,或者如果您不希望缓冲区以空值终止,则可以选择分配长度为 n-1
的缓冲区。显然,您需要将 n
或 n-1
作为 cchWideChar
参数传递,以匹配缓冲区的实际长度。
查看您的代码,很明显您想要创建以 null 结尾的缓冲区。您的 toUTF16
代码是正确的。您的 toUTF8
代码不是。您应该使用与 toUTF16
中相同的长度处理代码。更重要的是, WideCharToMultiByte
的最终参数有点不精确。它是一个指向布尔值的指针。代码应该是:
#define WCTMB(wstr, str, bufsiz) WideCharToMultiByte(CP_UTF8, 0, wstr, -1, str, bufsiz, NULL, NULL)
char *toUTF8(const WCHAR *wstr)
{
char *str;
int n;
n = WCTMB(wstr, NULL, 0);
if (n == 0)
logLastError("error figuring out number of characters to convert to in toUTF8()");
str = (char *) uiAlloc(n * sizeof (char), "char[]");
if (WCTMB(wstr, str, n) != n)
logLastError("error converting from UTF-16 to UTF-8 in toUTFF8()");
return str;
}