使用 Boost Spirit X3 解析 ipv4 地址
Parsing ipv4 address with Boost Spirit X3
这里是X3新手。两个问题:
- 为什么结果包含重复的“1,1,1”,像这样:
<attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]]</attributes>
,当我期待这样的结果时 <attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, ., 1], [8, 0]]</attributes>
- 在 dec_octet 规则中定义要扩展为(处理为?)序列的单个字符的方法不是那么笨拙。我用过
x3::repeat(1)[x3::digit]
,但这似乎是错误的,可能会导致第一个问题的错误。 (使用 x3::repeat(1)[x3::digit]
是因为我似乎不能只使用 x3::digit,因为它会导致规则崩溃失败?)
#include <iostream>
#include <string>
#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
namespace x3 = boost::spirit::x3;
namespace ast
{
struct ip_port
{
std::string host;
boost::optional<std::string> port;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::ip_port, host, port)
namespace parser
{
template <typename T> auto as = [](auto name, auto p) { return x3::rule<struct _, T> {name} = p; };
const auto dec_octet = as<std::string>("dec_octet",
(
x3::char_('2') >> x3::char_('5') >> x3::char_('0', '5')
| x3::char_('2') >> x3::char_('0', '4') >> x3::digit
| x3::char_('1') >> x3::digit >> x3::digit
| x3::char_('1', '9') >> x3::digit
| x3::repeat(1)[x3::digit] // awkward way to force sequence from single char, but can't use x3::digit
)
);
const auto ipv4address = as<std::string>("ipv4address",
dec_octet >> x3::char_('.') >> dec_octet >> x3::char_('.') >> dec_octet >> x3::char_('.') >> dec_octet
);
const auto ip = as<std::string>("host", ipv4address);
const auto port = as<std::string>("port", +x3::digit);
const auto ip_port = as<ast::ip_port>("ip_port", ip >> -((':') >> port));
}
template <typename T, typename Parser>
bool parse(const std::string& in, const Parser& p)
{
T parsed;
auto iter = in.begin();
auto end_iter = in.end();
bool res = x3::parse(iter, end_iter, p, parsed);
return res && (iter == end_iter);
}
int main()
{
std::cerr << std::boolalpha << parse<ast::ip_port>(std::string{"192.168.1.1:80"}, parser::ip_port) << '\n';
return EXIT_SUCCESS;
}
调试输出:
<ip_port>
<try>192.168.1.1:80</try>
<host>
<try>192.168.1.1:80</try>
<ipv4address>
<try>192.168.1.1:80</try>
<dec_octet>
<try>192.168.1.1:80</try>
<success>.168.1.1:80</success>
<attributes>[1, 9, 2]</attributes>
</dec_octet>
<dec_octet>
<try>168.1.1:80</try>
<success>.1.1:80</success>
<attributes>[1, 6, 8]</attributes>
</dec_octet>
<dec_octet>
<try>1.1:80</try>
<success>.1:80</success>
<attributes>[1, 1, 1]</attributes>
</dec_octet>
<dec_octet>
<try>1:80</try>
<success>:80</success>
<attributes>[1, 1, 1]</attributes>
</dec_octet>
<success>:80</success>
<attributes>[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1]</attributes>
</ipv4address>
<success>:80</success>
<attributes>[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1]</attributes>
</host>
<port>
<try>80</try>
<success></success>
<attributes>[8, 0]</attributes>
</port>
<success></success>
<attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]]</attributes>
</ip_port>
true
谢谢。
Q. 1. Why the result contains repeated "1,1,1"s, like so: [[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]], when I expect something like this [[1, 9, 2, ., 1, 6, 8, ., 1, ., 1], [8, 0]]
the last time people ran into this pitfall已经7天了:
It's the age-old "container attributes aren't atomic" pitfall:
- boost::spirit::qi duplicate parsing on the output
- Understanding Boost.spirit's string parser
- Parsing with Boost::Spirit (V2.4) into container
You can paper over it using qi::hold
. Or you can revise your
strategy.
在那种情况下,我建议使用 raw
来获取底层源序列。
Q.
2. [...] not so awkward [...]
中间步骤是
const auto dec_octet = x3::raw [ x3::uint_parser<uint8_t>{} ];
繁荣。使用 X3 是高级解析器生成器这一事实。不要做细节问题、容易出错的工作。事实上,你可以简单地
const x3::uint_parser<std::uint8_t> dec_octet{};
将“字符串化”推迟到需要的地方:
const x3::uint_parser<std::uint8_t> dec_octet{};
const x3::uint_parser<std::uint16_t> port{};
const auto ipv4address = x3::raw [
dec_octet >> '.' >> dec_octet >> '.' >> dec_octet >> '.' >> dec_octet ];
const auto ip_port = as<ast::ip_port>("ip_port", ipv4address >> -(':' >> port));
抽脂后
注意使用 uint16_t
作为端口,x3::eoi
期望完全解析,删除显式 rule/conversions:
#include <iostream>
#include <string>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/optional/optional_io.hpp>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace ast {
struct ip_port {
std::string host;
boost::optional<uint16_t> port;
};
using boost::fusion::operator<<;
}
BOOST_FUSION_ADAPT_STRUCT(ast::ip_port, host, port)
namespace parser {
const x3::uint_parser<uint8_t> dec_octet {};
const x3::uint_parser<uint16_t> port {};
const auto ipv4address = x3::raw[dec_octet >> '.' >> dec_octet >> '.'
>> dec_octet >> '.' >> dec_octet];
const auto ip_port = ipv4address >> -(':' >> port) >> x3::eoi;
}
template <typename Parser, typename Attr>
static inline bool parse(std::string_view in, Parser const& p, Attr& result)
{
return x3::parse(in.begin(), in.end(), p, result);
}
auto parse_ipport(std::string_view in)
{
ast::ip_port result;
if (!parse(in, parser::ip_port, result))
throw std::invalid_argument("ipv4address");
return result;
}
int main()
{
for (auto input : { "192.168.1.1:80", "1.1.1.1", ":" }) {
std::cerr << parse_ipport(input) << std::endl;
}
}
版画
(192.168.1.1 80)
(1.1.1.1 --)
terminate called after throwing an instance of 'std::invalid_argument'
what(): ipv4address
Aborted (core dumped)
进一步简化代码by removing the optional
:
(192.168.1.1 80)
(1.1.1.1 0)
terminate called after throwing an instance of 'std::invalid_argument'
what(): ipv4address
开箱即用
注意:您的语法并不匹配所有符合 RFC 的 ip v4 地址。例如
127.1
对 127.0.0.1
有效。
0177.1
或 0x7f.1
也是如此
要么真正修复它,要么不要重新发明轮子,使用 boost::asio::ip::address_v4::from_string
甚至 boost::asio::ip::address::from_string
并免费获得 IPv6 支持。
这里是X3新手。两个问题:
- 为什么结果包含重复的“1,1,1”,像这样:
<attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]]</attributes>
,当我期待这样的结果时<attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, ., 1], [8, 0]]</attributes>
- 在 dec_octet 规则中定义要扩展为(处理为?)序列的单个字符的方法不是那么笨拙。我用过
x3::repeat(1)[x3::digit]
,但这似乎是错误的,可能会导致第一个问题的错误。 (使用x3::repeat(1)[x3::digit]
是因为我似乎不能只使用 x3::digit,因为它会导致规则崩溃失败?)
#include <iostream>
#include <string>
#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
namespace x3 = boost::spirit::x3;
namespace ast
{
struct ip_port
{
std::string host;
boost::optional<std::string> port;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::ip_port, host, port)
namespace parser
{
template <typename T> auto as = [](auto name, auto p) { return x3::rule<struct _, T> {name} = p; };
const auto dec_octet = as<std::string>("dec_octet",
(
x3::char_('2') >> x3::char_('5') >> x3::char_('0', '5')
| x3::char_('2') >> x3::char_('0', '4') >> x3::digit
| x3::char_('1') >> x3::digit >> x3::digit
| x3::char_('1', '9') >> x3::digit
| x3::repeat(1)[x3::digit] // awkward way to force sequence from single char, but can't use x3::digit
)
);
const auto ipv4address = as<std::string>("ipv4address",
dec_octet >> x3::char_('.') >> dec_octet >> x3::char_('.') >> dec_octet >> x3::char_('.') >> dec_octet
);
const auto ip = as<std::string>("host", ipv4address);
const auto port = as<std::string>("port", +x3::digit);
const auto ip_port = as<ast::ip_port>("ip_port", ip >> -((':') >> port));
}
template <typename T, typename Parser>
bool parse(const std::string& in, const Parser& p)
{
T parsed;
auto iter = in.begin();
auto end_iter = in.end();
bool res = x3::parse(iter, end_iter, p, parsed);
return res && (iter == end_iter);
}
int main()
{
std::cerr << std::boolalpha << parse<ast::ip_port>(std::string{"192.168.1.1:80"}, parser::ip_port) << '\n';
return EXIT_SUCCESS;
}
调试输出:
<ip_port>
<try>192.168.1.1:80</try>
<host>
<try>192.168.1.1:80</try>
<ipv4address>
<try>192.168.1.1:80</try>
<dec_octet>
<try>192.168.1.1:80</try>
<success>.168.1.1:80</success>
<attributes>[1, 9, 2]</attributes>
</dec_octet>
<dec_octet>
<try>168.1.1:80</try>
<success>.1.1:80</success>
<attributes>[1, 6, 8]</attributes>
</dec_octet>
<dec_octet>
<try>1.1:80</try>
<success>.1:80</success>
<attributes>[1, 1, 1]</attributes>
</dec_octet>
<dec_octet>
<try>1:80</try>
<success>:80</success>
<attributes>[1, 1, 1]</attributes>
</dec_octet>
<success>:80</success>
<attributes>[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1]</attributes>
</ipv4address>
<success>:80</success>
<attributes>[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1]</attributes>
</host>
<port>
<try>80</try>
<success></success>
<attributes>[8, 0]</attributes>
</port>
<success></success>
<attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]]</attributes>
</ip_port>
true
谢谢。
Q. 1. Why the result contains repeated "1,1,1"s, like so: [[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]], when I expect something like this [[1, 9, 2, ., 1, 6, 8, ., 1, ., 1], [8, 0]]
the last time people ran into this pitfall已经7天了:
It's the age-old "container attributes aren't atomic" pitfall:
- boost::spirit::qi duplicate parsing on the output
- Understanding Boost.spirit's string parser
- Parsing with Boost::Spirit (V2.4) into container
You can paper over it using
qi::hold
. Or you can revise your strategy.
在那种情况下,我建议使用 raw
来获取底层源序列。
Q. 2. [...] not so awkward [...]
中间步骤是
const auto dec_octet = x3::raw [ x3::uint_parser<uint8_t>{} ];
繁荣。使用 X3 是高级解析器生成器这一事实。不要做细节问题、容易出错的工作。事实上,你可以简单地
const x3::uint_parser<std::uint8_t> dec_octet{};
将“字符串化”推迟到需要的地方:
const x3::uint_parser<std::uint8_t> dec_octet{};
const x3::uint_parser<std::uint16_t> port{};
const auto ipv4address = x3::raw [
dec_octet >> '.' >> dec_octet >> '.' >> dec_octet >> '.' >> dec_octet ];
const auto ip_port = as<ast::ip_port>("ip_port", ipv4address >> -(':' >> port));
抽脂后
注意使用 uint16_t
作为端口,x3::eoi
期望完全解析,删除显式 rule/conversions:
#include <iostream>
#include <string>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/optional/optional_io.hpp>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace ast {
struct ip_port {
std::string host;
boost::optional<uint16_t> port;
};
using boost::fusion::operator<<;
}
BOOST_FUSION_ADAPT_STRUCT(ast::ip_port, host, port)
namespace parser {
const x3::uint_parser<uint8_t> dec_octet {};
const x3::uint_parser<uint16_t> port {};
const auto ipv4address = x3::raw[dec_octet >> '.' >> dec_octet >> '.'
>> dec_octet >> '.' >> dec_octet];
const auto ip_port = ipv4address >> -(':' >> port) >> x3::eoi;
}
template <typename Parser, typename Attr>
static inline bool parse(std::string_view in, Parser const& p, Attr& result)
{
return x3::parse(in.begin(), in.end(), p, result);
}
auto parse_ipport(std::string_view in)
{
ast::ip_port result;
if (!parse(in, parser::ip_port, result))
throw std::invalid_argument("ipv4address");
return result;
}
int main()
{
for (auto input : { "192.168.1.1:80", "1.1.1.1", ":" }) {
std::cerr << parse_ipport(input) << std::endl;
}
}
版画
(192.168.1.1 80)
(1.1.1.1 --)
terminate called after throwing an instance of 'std::invalid_argument'
what(): ipv4address
Aborted (core dumped)
进一步简化代码by removing the optional
:
(192.168.1.1 80)
(1.1.1.1 0)
terminate called after throwing an instance of 'std::invalid_argument'
what(): ipv4address
开箱即用
注意:您的语法并不匹配所有符合 RFC 的 ip v4 地址。例如
127.1
对127.0.0.1
有效。0177.1
或0x7f.1
也是如此
要么真正修复它,要么不要重新发明轮子,使用 boost::asio::ip::address_v4::from_string
甚至 boost::asio::ip::address::from_string
并免费获得 IPv6 支持。