使用 C++14 的正则表达式处理字符串
process a string using regular expression with C++14
我需要从 C++14 中的字符串中提取 3 个变量。字符串格式为:
a single uppercase char + `->` + a single uppercase char + `,` + a number
例如:A->B,100
、C->D,20000
、E->F,22
。我想提取单个大写字符和数字,如 A
、B
、100
。到目前为止,我可以编写一个标记化函数来通过多次调用 tokenized() 来分隔它们,如下所示:
vector<string> tokenize(const string& s, const char c)
{
vector<string> splitted;
auto ss = stringstream(s);
string tmp;
while(getline(ss, tmp, c)) { splitted.push_back(tmp); }
return splitted;
}
// ...
t = tokenized(s, '')
但我想知道是否有更简单的方法来使用正则表达式提取这 3 个变量?我尝试了 \(A-Z.*?->\A-Z)\,\
但显然我写错了正则表达式。
您似乎想要的模式是:
\[A-Z]->[A-Z],[0-9]+\
解释:
[A-Z] single uppercase character
-> ->
[A-Z] single uppercase character
, ,
[0-9]+ whole number
请注意 [A-Z]
和 [0-9]
是字符 类,代表该特定字符范围内的任何大写字符或数字。
对于 C++ 中的正则表达式,您需要了解一件事:您需要自己对多个匹配项进行迭代。这是一个例子:
#include <iostream>
#include <regex>
#include <string>
#include <vector>
//-----------------------------------------------------------------------------
// if you're going to tokenize you might as well return tokens
struct token_t
{
token_t(const std::string& f, const std::string& t, const unsigned long v) :
from{ f },
to{ t },
value{ v }
{
}
std::string from;
std::string to;
unsigned long value;
};
std::ostream& operator<<(std::ostream& os, const token_t& token)
{
os << "token : from = " << token.from << ", to = " << token.to << ", value = " << token.value << std::endl;
return os;
}
std::ostream& operator<<(std::ostream& os, const std::vector<token_t>& tokens)
{
std::cout << std::endl << "------------------ tokens ------------------" << std::endl;
for (const auto& token : tokens)
{
os << token;
}
std::cout << "--------------------------------------------" << std::endl;
return os;
}
//-----------------------------------------------------------------------------
auto tokenize(const std::string& s)
{
static const std::regex rx{ "([A-Z])->([A-Z]),([0-9]+)" };
std::smatch match;
std::vector<token_t> tokens;
auto from = s.cbegin();
while (std::regex_search(from, s.cend(), match, rx))
{
tokens.push_back({ match[1], match[2], std::stoul(match[3]) });
from = match.suffix().first;
}
return tokens;
}
//-----------------------------------------------------------------------------
int main()
{
auto v1 = tokenize("A->B,100");
auto v2 = tokenize("A->B,100, C->D,2000, E->F,22");
std::cout << v1;
std::cout << v2;
return 0;
}
我需要从 C++14 中的字符串中提取 3 个变量。字符串格式为:
a single uppercase char + `->` + a single uppercase char + `,` + a number
例如:A->B,100
、C->D,20000
、E->F,22
。我想提取单个大写字符和数字,如 A
、B
、100
。到目前为止,我可以编写一个标记化函数来通过多次调用 tokenized() 来分隔它们,如下所示:
vector<string> tokenize(const string& s, const char c)
{
vector<string> splitted;
auto ss = stringstream(s);
string tmp;
while(getline(ss, tmp, c)) { splitted.push_back(tmp); }
return splitted;
}
// ...
t = tokenized(s, '')
但我想知道是否有更简单的方法来使用正则表达式提取这 3 个变量?我尝试了 \(A-Z.*?->\A-Z)\,\
但显然我写错了正则表达式。
您似乎想要的模式是:
\[A-Z]->[A-Z],[0-9]+\
解释:
[A-Z] single uppercase character
-> ->
[A-Z] single uppercase character
, ,
[0-9]+ whole number
请注意 [A-Z]
和 [0-9]
是字符 类,代表该特定字符范围内的任何大写字符或数字。
对于 C++ 中的正则表达式,您需要了解一件事:您需要自己对多个匹配项进行迭代。这是一个例子:
#include <iostream>
#include <regex>
#include <string>
#include <vector>
//-----------------------------------------------------------------------------
// if you're going to tokenize you might as well return tokens
struct token_t
{
token_t(const std::string& f, const std::string& t, const unsigned long v) :
from{ f },
to{ t },
value{ v }
{
}
std::string from;
std::string to;
unsigned long value;
};
std::ostream& operator<<(std::ostream& os, const token_t& token)
{
os << "token : from = " << token.from << ", to = " << token.to << ", value = " << token.value << std::endl;
return os;
}
std::ostream& operator<<(std::ostream& os, const std::vector<token_t>& tokens)
{
std::cout << std::endl << "------------------ tokens ------------------" << std::endl;
for (const auto& token : tokens)
{
os << token;
}
std::cout << "--------------------------------------------" << std::endl;
return os;
}
//-----------------------------------------------------------------------------
auto tokenize(const std::string& s)
{
static const std::regex rx{ "([A-Z])->([A-Z]),([0-9]+)" };
std::smatch match;
std::vector<token_t> tokens;
auto from = s.cbegin();
while (std::regex_search(from, s.cend(), match, rx))
{
tokens.push_back({ match[1], match[2], std::stoul(match[3]) });
from = match.suffix().first;
}
return tokens;
}
//-----------------------------------------------------------------------------
int main()
{
auto v1 = tokenize("A->B,100");
auto v2 = tokenize("A->B,100, C->D,2000, E->F,22");
std::cout << v1;
std::cout << v2;
return 0;
}