RE2 不匹配非 ascii 字符
RE2 Not matching non-ascii characters
我无法让 RE2 使用它们的 hex/octal 表示来匹配字节(不是 ascii)。
下面的代码片段解释了这个问题:
char *test = "abc""\xe2""xyz";
std::string str(test); // "abc2xyz" . 2 is octal for \xe2
// str.size() == 7
re2::StringPiece string_piece(str); // size is 7, as expected
std::string out;
// extracts the letter 'z' into 'out;. 2 is the octal for z
bool match = re2::RE2::PartialMatch(string_piece, ("(2)"), &out); // match = true, out = 'z'.
// should extract the character 2...but it doesn't.
match = re2::RE2::PartialMatch(string_piece, ("(2)"), &out); // match = false
将编码设置为 latin-1 - RE2 默认为 UTF-8
match = re2::RE2::PartialMatch(string_piece,
re2::RE2("(2)", re2::RE2::Latin1),
&out);
我无法让 RE2 使用它们的 hex/octal 表示来匹配字节(不是 ascii)。
下面的代码片段解释了这个问题:
char *test = "abc""\xe2""xyz";
std::string str(test); // "abc2xyz" . 2 is octal for \xe2
// str.size() == 7
re2::StringPiece string_piece(str); // size is 7, as expected
std::string out;
// extracts the letter 'z' into 'out;. 2 is the octal for z
bool match = re2::RE2::PartialMatch(string_piece, ("(2)"), &out); // match = true, out = 'z'.
// should extract the character 2...but it doesn't.
match = re2::RE2::PartialMatch(string_piece, ("(2)"), &out); // match = false
将编码设置为 latin-1 - RE2 默认为 UTF-8
match = re2::RE2::PartialMatch(string_piece, re2::RE2("(2)", re2::RE2::Latin1), &out);