在 ASCII/UTF8 中转换带有重音符号的 Unicode 字母
converting Unicode letters with accents in ASCII/UTF8
我正在寻找一种技术来转换由服务器发送的包含如下内容的字符串 (JSON):
...."Test \u00e9\u00e9\u00e9".....
类似于:"Test ééé"
我找到了解决方案:boost::replace_all(listFolder, "\u00e9", "é");
我正在将此增强功能与其他字母 àùèê 等一起使用....这很痛苦!
不知道有没有自动进行这种转换的函数
否则,我想告诉你一件事,如果我使用此功能,服务器将正确处理我发送给它并包含带重音符号的字母的字符串:
std::string fromLocale(std::string localeStr)
{
boost::locale::generator g;
g.locale_cache_enabled(true);
std::locale loc = g(boost::locale::util::get_system_locale());
return boost::locale::conv::to_utf<char>(localeStr,loc);
}
不幸的是,该代码的逆函数无法处理服务器发送的字符串。
std::string toLocale(std::string utf8Str)
{
boost::locale::generator g;
g.locale_cache_enabled(true);
std::locale loc = g(boost::locale::util::get_system_locale());
return boost::locale::conv::from_utf<char>(utf8Str,loc);
}
JSON specification 允许 Unicode 字符的 "\uXXXX"
序列(与其他 \X
转义序列一样)。如果您没有使用现有的 JSON 解析器来处理此类序列的解码,则必须手动解码它们,例如:
// JSON uses Unicode, but is commonly encoded as UTF-8. However, Unicode
// characters that are encoded in "\uXXXX" format are expressed as UTF-16
// codeunit values, using surrogate pairs for codepoint values U+10000 and
// higher. This example uses C++11's std::u16string to handle UTF-16 parsing.
// If you are not using C++11 or later, you can replace it with std::wstring
// on platforms where wchar_t is 16bit, for instance. If you want to handle
// the JSON using std::string/UTF-8 instead, you will have to tweak this
// parsing accordingly...
std::u16string str = ...; // JSON quoted-string value, eg: "Test \u00e9\u00e9\u00e9"...
std::u16string::size_type idx = 0;
do
{
idx = str.find(u'\', idx);
if (idx == std::u16string::npos) break;
std::u16string replaceStr;
std::u16string::size_type len = 2;
char16_t ch = str.at(idx+1);
switch (ch)
{
case u'\"':
case u'\':
case u'/':
replaceStr = ch;
break;
case u'b':
replaceStr = u'\b';
break;
case u'f':
replaceStr = u'\f';
break;
case u'n':
replaceStr = u'\n';
break;
case u'r':
replaceStr = u'\r';
break;
case u't':
replaceStr = u'\t';
break;
case u'u':
{
std::u16string hexStr = str.substr(idx+2, 4);
len += hexStr.size();
std::basic_istringstream<char16_t> iss(hexStr);
unsigned short value;
iss >> std::hex >> value;
if (!iss)
{
// illegal value, do something
}
replaceStr = (char_t) value;
break;
}
default:
// illegal sequence, do something
break;
}
str.replace(idx, len, replaceStr);
idx += replaceStr.size();
}
while (true);
我找到的解决方案是使用 RapidJson。
我正在寻找一种技术来转换由服务器发送的包含如下内容的字符串 (JSON):
...."Test \u00e9\u00e9\u00e9".....
类似于:"Test ééé"
我找到了解决方案:boost::replace_all(listFolder, "\u00e9", "é");
我正在将此增强功能与其他字母 àùèê 等一起使用....这很痛苦!
不知道有没有自动进行这种转换的函数
否则,我想告诉你一件事,如果我使用此功能,服务器将正确处理我发送给它并包含带重音符号的字母的字符串:
std::string fromLocale(std::string localeStr)
{
boost::locale::generator g;
g.locale_cache_enabled(true);
std::locale loc = g(boost::locale::util::get_system_locale());
return boost::locale::conv::to_utf<char>(localeStr,loc);
}
不幸的是,该代码的逆函数无法处理服务器发送的字符串。
std::string toLocale(std::string utf8Str)
{
boost::locale::generator g;
g.locale_cache_enabled(true);
std::locale loc = g(boost::locale::util::get_system_locale());
return boost::locale::conv::from_utf<char>(utf8Str,loc);
}
JSON specification 允许 Unicode 字符的 "\uXXXX"
序列(与其他 \X
转义序列一样)。如果您没有使用现有的 JSON 解析器来处理此类序列的解码,则必须手动解码它们,例如:
// JSON uses Unicode, but is commonly encoded as UTF-8. However, Unicode
// characters that are encoded in "\uXXXX" format are expressed as UTF-16
// codeunit values, using surrogate pairs for codepoint values U+10000 and
// higher. This example uses C++11's std::u16string to handle UTF-16 parsing.
// If you are not using C++11 or later, you can replace it with std::wstring
// on platforms where wchar_t is 16bit, for instance. If you want to handle
// the JSON using std::string/UTF-8 instead, you will have to tweak this
// parsing accordingly...
std::u16string str = ...; // JSON quoted-string value, eg: "Test \u00e9\u00e9\u00e9"...
std::u16string::size_type idx = 0;
do
{
idx = str.find(u'\', idx);
if (idx == std::u16string::npos) break;
std::u16string replaceStr;
std::u16string::size_type len = 2;
char16_t ch = str.at(idx+1);
switch (ch)
{
case u'\"':
case u'\':
case u'/':
replaceStr = ch;
break;
case u'b':
replaceStr = u'\b';
break;
case u'f':
replaceStr = u'\f';
break;
case u'n':
replaceStr = u'\n';
break;
case u'r':
replaceStr = u'\r';
break;
case u't':
replaceStr = u'\t';
break;
case u'u':
{
std::u16string hexStr = str.substr(idx+2, 4);
len += hexStr.size();
std::basic_istringstream<char16_t> iss(hexStr);
unsigned short value;
iss >> std::hex >> value;
if (!iss)
{
// illegal value, do something
}
replaceStr = (char_t) value;
break;
}
default:
// illegal sequence, do something
break;
}
str.replace(idx, len, replaceStr);
idx += replaceStr.size();
}
while (true);
我找到的解决方案是使用 RapidJson。