如何将十六进制表示从URL(%)转换为std::string(中文)?
How to convert hex representation from URL (%) to std::string (chinese text)?
简介
我有一些输入需要转换为正确的汉字,但我想我在最后一个数字到字符串的转换上遇到了困难。我已经使用 this hex to text converter online tool 检查过 e6b9af
对应于文本 湯
.
MWE
这是我为说明问题而制作的一个最小示例。输入是 "%e6%b9%af"
(从其他地方的 URL 获得)。
#include <iostream>
#include <string>
std::string attempt(std::string path)
{
std::size_t i = path.find("%");
while (i != std::string::npos)
{
std::string sub = path.substr(i, 9);
sub.erase(i + 6, 1);
sub.erase(i + 3, 1);
sub.erase(i, 1);
std::size_t s = std::stoul(sub, nullptr, 16);
path.replace(i, 9, std::to_string(s));
i = path.find("%");
}
return path;
}
int main()
{
std::string input = "%E6%B9%AF";
std::string goal = "湯";
// convert input to goal
input = attempt(input);
std::cout << goal << " and " << input << (input == goal ? " are the same" : " are not the same") << std::endl;
return 0;
}
输出
湯 and 15120815 are not the same
预期输出
湯 and 湯 are the same
附加问题
外文的字符都是3字节还是只有中文?由于我的尝试假定块为 3 个字节,这是一个好的假设吗?
根据您的建议并更改 this other post 中的示例。这是我想出来的。
#include <iostream>
#include <string>
#include <sstream>
std::string decode_url(const std::string& path)
{
std::stringstream decoded;
for (std::size_t i = 0; i < path.size(); i++)
{
if (path[i] != '%')
{
if (path[i] == '+')
decoded << ' ';
else
decoded << path[i];
}
else
{
unsigned int j;
sscanf(path.substr(i + 1, 2).c_str(), "%x", &j);
decoded << static_cast<char>(j);
i += 2;
}
}
return decoded.str();
}
int main()
{
std::string input = "%E6%B9%AF";
std::string goal = "湯";
// convert input to goal
input = decode_url(input);
std::cout << goal << " and " << input << (input == goal ? " are the same" : " are not the same") << std::endl;
return 0;
}
输出
湯 and 湯 are the same
简介
我有一些输入需要转换为正确的汉字,但我想我在最后一个数字到字符串的转换上遇到了困难。我已经使用 this hex to text converter online tool 检查过 e6b9af
对应于文本 湯
.
MWE
这是我为说明问题而制作的一个最小示例。输入是 "%e6%b9%af"
(从其他地方的 URL 获得)。
#include <iostream>
#include <string>
std::string attempt(std::string path)
{
std::size_t i = path.find("%");
while (i != std::string::npos)
{
std::string sub = path.substr(i, 9);
sub.erase(i + 6, 1);
sub.erase(i + 3, 1);
sub.erase(i, 1);
std::size_t s = std::stoul(sub, nullptr, 16);
path.replace(i, 9, std::to_string(s));
i = path.find("%");
}
return path;
}
int main()
{
std::string input = "%E6%B9%AF";
std::string goal = "湯";
// convert input to goal
input = attempt(input);
std::cout << goal << " and " << input << (input == goal ? " are the same" : " are not the same") << std::endl;
return 0;
}
输出
湯 and 15120815 are not the same
预期输出
湯 and 湯 are the same
附加问题
外文的字符都是3字节还是只有中文?由于我的尝试假定块为 3 个字节,这是一个好的假设吗?
根据您的建议并更改 this other post 中的示例。这是我想出来的。
#include <iostream>
#include <string>
#include <sstream>
std::string decode_url(const std::string& path)
{
std::stringstream decoded;
for (std::size_t i = 0; i < path.size(); i++)
{
if (path[i] != '%')
{
if (path[i] == '+')
decoded << ' ';
else
decoded << path[i];
}
else
{
unsigned int j;
sscanf(path.substr(i + 1, 2).c_str(), "%x", &j);
decoded << static_cast<char>(j);
i += 2;
}
}
return decoded.str();
}
int main()
{
std::string input = "%E6%B9%AF";
std::string goal = "湯";
// convert input to goal
input = decode_url(input);
std::cout << goal << " and " << input << (input == goal ? " are the same" : " are not the same") << std::endl;
return 0;
}
输出
湯 and 湯 are the same