使用 boost xpressive 性能下降
Slow performance using boost xpressive
最近我一直在使用 boost xpressive 来解析文件。这些文件每个有 10 MB,将有数百个文件需要解析。
Xpressive 工作起来很好,语法清晰,但问题在于性能。它在调试版本中的爬行方式令人难以置信,而在发布版本中,它每个文件花费的时间超过整整一秒。我已经针对旧的普通 get_line()、find() 和 sscanf() 代码进行了测试,它可以轻松击败 xpressive。
我知道类型检查、回溯等都是有代价的,但这对我来说似乎太过分了。我怎么想,我做错了什么?有什么方法可以以适当的速度将其优化为 运行 吗?是否应该努力将代码迁移到 boost::spirit?
我准备了一个精简版代码,其中嵌入了几行真实文件,以防有人测试和提供帮助。
注意-作为一项要求,必须使用 VS 2010(遗憾的是不完全符合 c++11)
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
const char input[] = "[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: BTN-1002 - Km: 90.0 - SWITCH_ON: 1\n\
[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:14:04.510025] - 5.130 s => Driver: 0 - Speed: 4.9 - Road: B-302 - Km: 91.1 - GEAR: 3";
const auto len = std::distance(std::begin(input), std::end(input));
struct Sequence
{
int ms;
int driver;
int sequence;
double time;
double vel;
double km;
std::string date;
std::string road;
};
namespace xp = boost::xpressive;
int main()
{
Sequence data;
std::vector<Sequence> sequences;
using namespace xp;
cregex real = (+_d >> '.' >> +_d);
cregex keyword = " - SEQUENCE: " >> (+_d)[xp::ref(data.sequence) = as<int>(_)];
cregex date = repeat<4>(_d) >> '-' >> repeat<3>(alpha) >> '-' >> repeat<2>(_d) >> _s >> repeat<2>(_d) >> ':' >> repeat<2>(_d) >> ':' >> repeat<2>(_d);
cregex header = '[' >> date[xp::ref(data.date) = _] >> '.' >> (+_d)[xp::ref(data.ms) = as<int>(_)] >> "] - "
>> real[xp::ref(data.time) = as<double>(_)]
>> " s => Driver: " >> (+_d)[xp::ref(data.driver) = as<int>(_)]
>> " - Speed: " >> real[xp::ref(data.vel) = as<double>(_)]
>> " - Road: " >> (+set[alnum | '-'])[xp::ref(data.road) = _]
>> " - Km: " >> real[xp::ref(data.km) = as<double>(_)];
xp::cregex parser = (header >> keyword >> _ln);
xp::cregex_iterator cur(input, input + len, parser);
xp::cregex_iterator end;
for (; cur != end; ++cur)
sequences.emplace_back(data);
return 0;
}
请注意 VS 2010 限制。
我大致看到两个需要改进的地方:
- 你基本上解析了所有行,包括你不感兴趣的行
- 你分配了很多字符串
我建议使用字符串视图来修复分配。接下来,您可以尝试避免解析与 SEQUENCE 模式不匹配的行。原则上没有理由不能使用 Boost Xpressive 来完成,但我选择的武器恰好是 Boost Spirit,所以我也将其包括在内。
有选择性
您可以在像这样花费更多精力之前检测到有趣的线条:
cregex signature = -*~_n >> " - SEQUENCE: " >> (+_d) >> before(_ln|eos);
for (xp::cregex_iterator cur(b, e, signature), end; cur != end; ++cur) {
std::cout << "'" << cur->str() << "'\n";
}
这会打印
'[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1'
'[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4'
'[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8'
'[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15'
'[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21'
'[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29'
'[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34'
'[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45'
未分配任何内容。这应该很快。
减少分配
为此,我将切换到 Spirit,因为它会让事情变得更容易。
Note: The real reason I switched here is because, in contrast to Boost Spirit, Xpressive does not appear to have extensible attribute propagation traits. This could be my lack of experience with it.
The alternative approach would almost certainly replace the actions with manual propagation code, which in turn would inform named capture groups in order to keep things legible. I'm not sure about the performance overhead of these, so let's not use them at this point.
您可以使用 boost::string_view
和 "teach" Qi 的特征来为其分配文本:
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
这样,Qi 文法可能看起来像这样:
template <typename It> struct QiParser : qi::grammar<It, Sequence()> {
QiParser() : QiParser::base_type(line) {
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
line = '[' >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - SEQUENCE: " >> int_
>> (eol|eoi);
}
private:
qi::rule<It, Sequence()> line;
};
使用起来非常简单,特别是如果不是"selective"。
This happens to be the "winning" configuration. Here's the standalone, simplified version of that algorithm after removing all benchmark-related generics and options: Live on Coliru
基准测试结果:惊喜
使用选择性解析方法只会使 Xpressive 方法变慢:Interactive
与 Spirit 相比,我最初也是从选择性方法开始的(完全预计它会更快)。这是 not-so-encouraging 结果:Interactive
糟糕。最初的 Xpressive 方法仍然优越!
调整假设
好吧,很明显首先进行浅扫描,然后 "full parse" 会影响性能。从理论上讲,这可能归结为 cache/prefetch 效应。此外,线性方法可能会获胜,因为它更容易发现一行不以 '['
字符开头,而不是查看它是否 结束 与 SEQUENCE
图案。
所以我决定将精神方法也适应线性模式,看看减少分配的胜利是否仍然值得:Interactive
现在我们得到了结果。让我们详细看看 std::string
和 boost::string_view
方法之间的区别: Interactive
Summary/Conclusions
减少分配有利于 效率提高 30%。总的来说,比原来的方法改进了 10 倍。
请注意,基准代码竭尽全力消除实现之间的不公平差异(例如,通过在 Spirit 和 Xpressive 上预编译所有内容)。查看完整的基准代码:
The winning implementation in isolation: Live on Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
struct Sequence {
int driver;
int sequence;
double time;
double vel;
double km;
boost::string_view date;
boost::string_view road;
};
BOOST_FUSION_ADAPT_STRUCT(::Sequence, date, time, driver, vel, road, km, sequence)
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
std::vector<Sequence> parse_spirit(It b, It e) {
qi::rule<It, Sequence()> static const line = []{
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
qi::rule<It, Sequence()> r = '[' >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - SEQUENCE: " >> int_
>> (eol|eoi);
return r;
}();
std::vector<Sequence> sequences;
parse(b, e, *boost::spirit::repository::qi::seek[line], sequences);
return sequences;
}
static char input[] = /*... see question ...*/;
static const size_t len = strlen(input);
int main() {
auto sequences = parse_spirit(input, input+len);
std::cout << "Parsed: " << sequences.size() << " sequence lines\n";
}
完整基准代码
基准使用 Nonius 进行测量和统计分析。
- 这里有完整的交互式图表:http://Whosebug-sehe.s3.amazonaws.com/9f88e055-4b5f-4026-8f2f-54e2bcad430d/stats.html
- 如果有 Nonius 可用,请使用
-DUSE_NONIUS
编译
- 为 "correctness" 模式使用
-DVERIFY_OUTPUT
编译:在这种情况下,不进行任何计时,但会回显解析结果以进行验证
#include <cstring> // strlen
static char input[] =
"[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:14:04.510025] - 5.130 s => Driver: 0 - Speed: 4.9 - Road: B-302 - Km: 91.1 - GEAR: 3";
static const size_t len = strlen(input);
#include <boost/utility/string_view.hpp>
#include <boost/fusion/adapted/struct.hpp>
template <typename String> struct Sequence {
int driver;
int sequence;
double time;
double vel;
double km;
String date;
String road;
};
BOOST_FUSION_ADAPT_TPL_STRUCT((T),(Sequence)(T), date, time, driver, vel, road, km, sequence)
// Declare implementations under test:
using It = char const*;
template <typename S> std::vector<S> parse_xpressive_linear(It b, It e);
template <typename S> std::vector<S> parse_xpressive_selective(It b, It e);
template <typename S> std::vector<S> parse_spirit_linear(It b, It e);
template <typename S> std::vector<S> parse_spirit_selective(It b, It e);
#ifdef VERIFY_OUTPUT
#include <boost/fusion/include/io.hpp>
using boost::fusion::operator<<;
#include <iostream>
#define VERIFY() \
do { \
std::cout << "L:" << __LINE__ << " Parsed: " << sequences.size() << "\n"; \
for (auto r : sequences) { \
std::cout << r << "\n"; \
} \
} while (0)
#else
#define VERIFY() do { } while (0)
#endif
#ifdef USE_NONIUS
#include <nonius/benchmark.h++>
#define NONIUS_RUNNER
#include <nonius/main.h++>
#else
// mock nonius
namespace nonius {
struct chronometer{
template <typename F> static inline void measure(F&& f) { std::forward<F>(f)(); }
};
static std::vector<std::function<void(chronometer)>> s_benchmarks;
#define TOKENPASTE(x, y) x ## y
#define TOKENPASTE2(x, y) TOKENPASTE(x, y)
#define NONIUS_BENCHMARK(name, f) static auto TOKENPASTE2(s_reg_, __LINE__) = []{ ::nonius::s_benchmarks.push_back(f); return 42; }();
void run() { for (auto& b : s_benchmarks) b({}); }
}
int main() {
nonius::run();
}
#endif
template <typename R>
void do_test_kernel(nonius::chronometer& cm, std::vector<R> (*f)(It, It)) {
std::vector<R> sequences;
cm.measure([&sequences,f]{ sequences = f(input, input + len); });
VERIFY();
}
#define TEST_CASE(name, string) NONIUS_BENCHMARK(#name"-"#string, [](nonius::chronometer cm) { do_test_kernel(cm, &name<Sequence<string> >); })
// Xpressive doesn't support string_view
TEST_CASE(parse_xpressive_linear, std::string)
TEST_CASE(parse_xpressive_selective, std::string)
TEST_CASE(parse_spirit_linear, std::string)
TEST_CASE(parse_spirit_linear, boost::string_view)
TEST_CASE(parse_spirit_selective, std::string)
TEST_CASE(parse_spirit_selective, boost::string_view)
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
namespace xp = boost::xpressive;
namespace XpressiveDetail {
using namespace xp;
struct Scanner {
cregex scan {-*~xp::_n >> " - SEQUENCE: " >> (+xp::_d) >> xp::_ln};
};
template <typename Seq> struct Parser : Scanner {
mutable Seq seq; // non-thread-safe, but fairer to compare to Spirit
cregex real = (+_d >> '.' >> +_d);
cregex keyword = " - SEQUENCE: " >> (+_d)[xp::ref(seq.sequence) = as<int>(_)];
cregex date = repeat<4>(_d) >> '-'
>> repeat<3>(alpha) >> '-'
>> repeat<2>(_d)
>> _s
>> repeat<2>(_d) >> ':'
>> repeat<2>(_d) >> ':'
>> repeat<2>(_d)
>> '.' >> (+_d);
cregex header = '[' >> date[xp::ref(seq.date) = _] >> "] - "
>> real[xp::ref(seq.time) = as<double>(_)]
>> " s => Driver: " >> (+_d) [ xp ::ref(seq.driver) = as<int>(_) ]
>> " - Speed: " >> real [ xp ::ref(seq.vel) = as<double>(_) ]
>> " - Road: " >> (+set[alnum|'-']) [ xp ::ref(seq.road) = _ ]
>> " - Km: " >> real [ xp ::ref(seq.km) = as<double>(_) ];
cregex parser = (header >> keyword >> _ln);
};
}
template <typename Seq>
std::vector<Seq> parse_xpressive_linear(It b, It e) {
std::vector<Seq> sequences;
using namespace xp;
static const XpressiveDetail::Parser<Seq> precompiled{};
for (xp::cregex_iterator cur(b, e, precompiled.parser), end; cur != end; ++cur)
sequences.push_back(std::move(precompiled.seq));
return sequences;
}
template <typename Seq>
std::vector<Seq> parse_xpressive_selective(It b, It e) {
std::vector<Seq> sequences;
using namespace xp;
static const XpressiveDetail::Parser<Seq> precompiled{};
xp::match_results<It> m;
for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {
if (xp::regex_match(match[0].first, match[0].second, m, precompiled.parser))
sequences.push_back(std::move(precompiled.seq));
}
return sequences;
}
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
template <typename It, typename Attribute> struct QiParser : qi::grammar<It, Attribute()> {
QiParser() : QiParser::base_type(line) {
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
line = '[' >> eps(clear(_val)) >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - SEQUENCE: " >> int_
>> (eol|eoi);
BOOST_SPIRIT_DEBUG_NODES((line))
}
private:
struct clear_f {
// only required for linear approach to std::string-based
bool operator()(Sequence<std::string>& v) const { v = {}; return true; }
bool operator()(Sequence<boost::string_view>&) const { /*no_op();*/ return true; }
};
boost::phoenix::function<clear_f> clear;
qi::rule<It, Attribute()> line;
};
template <typename Seq = Sequence<std::string> >
std::vector<Seq> parse_spirit_selective(It b, It e) {
static QiParser<It, Seq> const qi_parser{};
static XpressiveDetail::Scanner const precompiled{};
std::vector<Seq> sequences;
for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {
Seq r;
if (parse(match[0].first, match[0].second, qi_parser, r))
sequences.push_back(r);
}
return sequences;
}
#include <boost/spirit/repository/include/qi_seek.hpp>
template <typename Seq = Sequence<std::string> >
std::vector<Seq> parse_spirit_linear(It b, It e) {
using boost::spirit::repository::qi::seek;
static QiParser<It, Seq> const qi_parser{};
std::vector<Seq> sequences;
parse(b, e, *seek[qi_parser], sequences);
return sequences;
}
示例文本报告:
clock resolution: mean is 17.7534 ns (40960002 iterations)
benchmarking parse_xpressive_linear-std::string
collecting 100 samples, 1 iterations each, in estimated 15.7252 ms
mean: 156.418 μs, lb 155.863 μs, ub 158.24 μs, ci 0.95
std dev: 4.62848 μs, lb 1637.89 ns, ub 10.4043 μs, ci 0.95
found 4 outliers among 100 samples (4%)
variance is moderately inflated by outliers
benchmarking parse_xpressive_selective-std::string
collecting 100 samples, 1 iterations each, in estimated 31.5459 ms
mean: 313.992 μs, lb 313.39 μs, ub 315.599 μs, ci 0.95
std dev: 4.5415 μs, lb 1105.98 ns, ub 9.07809 μs, ci 0.95
found 11 outliers among 100 samples (11%)
variance is slightly inflated by outliers
benchmarking parse_spirit_linear-std::string
collecting 100 samples, 1 iterations each, in estimated 2.1556 ms
mean: 21.2533 μs, lb 21.1623 μs, ub 21.6854 μs, ci 0.95
std dev: 870.481 ns, lb 53.2809 ns, ub 2.0738 μs, ci 0.95
found 7 outliers among 100 samples (7%)
variance is moderately inflated by outliers
benchmarking parse_spirit_linear-boost::string_view
collecting 100 samples, 2 iterations each, in estimated 2.944 ms
mean: 14.6677 μs, lb 14.6342 μs, ub 14.8279 μs, ci 0.95
std dev: 318.252 ns, lb 22.5097 ns, ub 757.555 ns, ci 0.95
found 5 outliers among 100 samples (5%)
variance is moderately inflated by outliers
benchmarking parse_spirit_selective-std::string
collecting 100 samples, 1 iterations each, in estimated 27.5512 ms
mean: 273.052 μs, lb 272.77 μs, ub 273.952 μs, ci 0.95
std dev: 2.31473 μs, lb 835.184 ns, ub 5.1322 μs, ci 0.95
found 10 outliers among 100 samples (10%)
variance is unaffected by outliers
benchmarking parse_spirit_selective-boost::string_view
collecting 100 samples, 1 iterations each, in estimated 27.0766 ms
mean: 269.446 μs, lb 269.208 μs, ub 270.268 μs, ci 0.95
std dev: 2.01634 μs, lb 627.834 ns, ub 4.56949 μs, ci 0.95
found 10 outliers among 100 samples (10%)
variance is unaffected by outliers
你可以使用具有精神特征的融合(参见示例parsing into several vector members),但我会考虑使用语义动作。
这是设计难题:
vector
具有特征
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
struct BaseEvent {
int driver;
int sequence;
double time;
double vel;
double km;
boost::string_view date;
boost::string_view road;
};
struct Sequence : BaseEvent{};
struct Clutch : BaseEvent{};
struct Gear : BaseEvent{};
BOOST_FUSION_ADAPT_STRUCT(::Sequence, date, time, driver, vel, road, km, sequence)
BOOST_FUSION_ADAPT_STRUCT(::Clutch, date, time, driver, vel, road, km, sequence)
BOOST_FUSION_ADAPT_STRUCT(::Gear, date, time, driver, vel, road, km, sequence)
struct LogEvents {
std::vector<Sequence> sequence;
std::vector<Clutch> clutch;
std::vector<Gear> gear;
void add(Sequence const& s) { sequence.push_back(s); }
void add(Clutch const& c) { clutch.push_back(c); }
void add(Gear const& g) { gear.push_back(g); }
};
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
template <> struct is_container<LogEvents> : std::true_type {};
template <> struct container_value<LogEvents> {
using type = boost::variant<::Sequence, ::Clutch, ::Gear>;
};
template <typename T> struct push_back_container<LogEvents, T> {
struct Visitor {
LogEvents& _log;
template <typename U> void operator()(U const& ev) const { _log.add(ev); }
using result_type = void;
};
template <typename... U>
static bool call(LogEvents& log, boost::variant<U...> const& attribute) {
boost::apply_visitor(Visitor{log}, attribute);
return true;
}
};
} } }
namespace QiParsers {
template <typename It, typename Attribute>
struct BaseEventParser : qi::grammar<It, Attribute()> {
BaseEventParser(std::string const& event_type) : BaseEventParser::base_type(start) {
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
start
= '[' >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - " >> lit(event_type) >> ": " >> int_
>> (eol|eoi);
}
private:
qi::rule<It, Attribute()> start;
};
}
LogEvents parse_spirit(It b, It e) {
QiParsers::BaseEventParser<It, ::Sequence> sequence("SEQUENCE");
QiParsers::BaseEventParser<It, ::Clutch> clutch("CLUTCH");
QiParsers::BaseEventParser<It, ::Gear> gear("GEAR");
LogEvents events;
assert(parse(b, e, *boost::spirit::repository::qi::seek[sequence|clutch|gear], events));
return events;
}
static char input[] = /* see question */;
static const size_t len = strlen(input);
int main() {
auto events = parse_spirit(input, input+len);
std::cout << "Events: "
<< events.sequence.size() << " sequence, "
<< events.clutch.size() << " clutch, "
<< events.gear.size() << " gear events\n";
using boost::fusion::operator<<;
for (auto& s : events.sequence) { std::cout << "SEQUENCE: " << s << "\n"; }
for (auto& c : events.clutch) { std::cout << "CLUTCH: " << c << "\n"; }
for (auto& g : events.gear) { std::cout << "GEAR: " << g << "\n"; }
}
翻转它:1 vector<variant<>>
使用变体向量不是更有意义吗?
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
namespace MyEvents {
struct BaseEvent {
int driver;
int sequence;
double time;
double vel;
double km;
boost::string_view date;
boost::string_view road;
};
struct Sequence : BaseEvent{};
struct Clutch : BaseEvent{};
struct Gear : BaseEvent{};
using LogEvent = boost::variant<Sequence, Clutch, Gear>;
using LogEvents = std::vector<LogEvent>;
}
BOOST_FUSION_ADAPT_STRUCT(MyEvents::Sequence, date, time, driver, vel, road, km, sequence)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::Clutch, date, time, driver, vel, road, km, sequence)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::Gear, date, time, driver, vel, road, km, sequence)
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
namespace QiParsers {
template <typename It, typename Attribute>
struct BaseEventParser : qi::grammar<It, Attribute()> {
BaseEventParser(std::string const& event_type) : BaseEventParser::base_type(start) {
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
start
= '[' >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - " >> lit(event_type) >> ": " >> int_
>> (eol|eoi);
}
private:
qi::rule<It, Attribute()> start;
};
template <typename It>
struct LogParser : qi::grammar<It, MyEvents::LogEvents()> {
LogParser() : LogParser::base_type(start) {
using namespace qi;
using boost::spirit::repository::qi::seek;
event = sequence | clutch | gear ; // TODO add types
start = *seek[event];
}
private:
qi::rule<It, MyEvents::LogEvents()> start;
qi::rule<It, MyEvents::LogEvent()> event;
BaseEventParser<It, MyEvents::Sequence> sequence{"SEQUENCE"};
BaseEventParser<It, MyEvents::Clutch> clutch{"CLUTCH"};
BaseEventParser<It, MyEvents::Gear> gear{"GEAR"};
};
}
MyEvents::LogEvents parse_spirit(It b, It e) {
static QiParsers::LogParser<It> const parser {};
MyEvents::LogEvents events;
parse(b, e, parser, events);
return events;
}
static char input[] = /* see question */;
static const size_t len = strlen(input);
namespace MyEvents { // for debug/demo
using boost::fusion::operator<<;
static inline char const* kind(Sequence const&) { return "SEQUENCE"; }
static inline char const* kind(Clutch const&) { return "CLUTCH"; }
static inline char const* kind(Gear const&) { return "GEAR"; }
struct KindVisitor : boost::static_visitor<char const*> {
template <typename T> char const* operator()(T const& ev) const { return kind(ev); }
};
static inline char const* kind(LogEvent const& ev) {
return boost::apply_visitor(KindVisitor{}, ev);
}
}
int main() {
auto events = parse_spirit(input, input+len);
std::cout << "Parsed: " << events.size() << " events\n";
for (auto& e : events)
std::cout << kind(e) << ": " << e << "\n";
}
概括:公共字段和其他事件
特别是如果你继续概括:
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
namespace MyEvents {
enum class Kind { Sequence, Clutch, Gear, Slope, Other };
struct CommonFields {
boost::string_view date;
double duration;
};
struct BaseEvent {
CommonFields common;
int driver;
int event_id;
double vel;
double km;
boost::string_view road;
Kind kind;
};
struct OtherEvent {
CommonFields common;
std::string message;
};
using LogEvent = boost::variant<BaseEvent, OtherEvent>;
using LogEvents = std::vector<LogEvent>;
}
BOOST_FUSION_ADAPT_STRUCT(MyEvents::CommonFields, date, duration)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::BaseEvent, common, driver, vel, road, km, kind, event_id)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::OtherEvent, common, message)
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
namespace QiParsers {
template <typename It>
struct LogParser : qi::grammar<It, MyEvents::LogEvents()> {
using Kind = MyEvents::Kind;
LogParser() : LogParser::base_type(start) {
using namespace qi;
kind.add
("SEQUENCE", Kind::Sequence)
("CLUTCH", Kind::Clutch)
("GEAR", Kind::Gear)
("SLOPE", Kind::Slope)
;
common_fields
= '[' >> raw[
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit
] >> "]"
>> " - " >> double_ >> " s";
base_event
= common_fields
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - " >> kind >> ": " >> int_;
other_event
= common_fields
>> " => " >> *~char_("\r\n");
event
= (base_event | other_event)
>> (eol|eoi);
start = *boost::spirit::repository::qi::seek[event];
}
private:
qi::rule<It, MyEvents::LogEvents()> start;
qi::rule<It, MyEvents::LogEvent()> event;
qi::rule<It, MyEvents::CommonFields()> common_fields;
qi::rule<It, MyEvents::BaseEvent()> base_event;
qi::rule<It, MyEvents::OtherEvent()> other_event;
qi::symbols<char, MyEvents::Kind> kind;
};
}
MyEvents::LogEvents parse_spirit(It b, It e) {
static QiParsers::LogParser<It> const parser {};
MyEvents::LogEvents events;
parse(b, e, parser, events);
return events;
}
static char input[] = /* see question */;
static const size_t len = strlen(input);
namespace MyEvents { // for debug/demo
using boost::fusion::operator<<;
static inline Kind getKind(BaseEvent const& be) { return be.kind; }
static inline Kind getKind(OtherEvent const&) { return Kind::Other; }
struct KindVisitor : boost::static_visitor<Kind> {
template <typename T> Kind operator()(T const& ev) const { return getKind(ev); }
};
static inline Kind getKind(LogEvent const& ev) {
return boost::apply_visitor(KindVisitor{}, ev);
}
static inline std::ostream& operator<<(std::ostream& os, Kind k) {
switch(k) {
case Kind::Sequence: return os << "SEQUENCE";
case Kind::Clutch: return os << "CLUTCH";
case Kind::Gear: return os << "GEAR";
case Kind::Slope: return os << "SLOPE";
case Kind::Other: return os << "(Other)";
}
return os;
}
}
int main() {
auto events = parse_spirit(input, input+len);
std::cout << "Parsed: " << events.size() << " events\n";
for (auto& e : events)
std::cout << getKind(e) << ": " << e << "\n";
}
打印例如
Parsed: 37 events
SLOPE: ((2018-Mar-13 13:13:59.580482 0.2) 0 0 A-11 90 SLOPE 0)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0 A-11 90 GEAR 0)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0.1 A-11 90 GEAR 1)
SEQUENCE: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.1 A-11 90 SEQUENCE 1)
CLUTCH: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.2 A-11 90 CLUTCH 1)
(Other): ((2018-Mar-13 13:14:01.819966 2.54) Backup to regestry)
[...]
奖金:Multi-Index
如果您使用 multi-index 个容器,您也可以吃蛋糕。
这是一个示例定义,它允许您根据一些相当任意选择的特征来索引向量:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/global_fun.hpp>
namespace Indexing {
namespace bmi = boost::multi_index;
using MyEvents::LogEvent;
double getDuration(LogEvent const& ev) { return getCommon(ev).duration; }
using Table = bmi::multi_index_container<
std::reference_wrapper<LogEvent const>, //LogEvent,
bmi::indexed_by<
bmi::ordered_non_unique<
bmi::tag<struct primary>,
bmi::composite_key<
LogEvent,
bmi::global_fun<LogEvent const&, MyEvents::Kind, MyEvents::getKind>,
bmi::global_fun<LogEvent const&, int, MyEvents::getEventId>
>
>,
bmi::ordered_non_unique<
bmi::tag<struct duration>,
bmi::global_fun<LogEvent const&, double, getDuration>
>
>
>;
}
现在您可以做一些有趣的事情,例如:
Indexing::Table idx(events.begin(), events.end());
/*
* // To print all events, grouped by by kind and event id:
* for (MyEvents::LogEvent const& e : idx)
* std::cout << getKind(e) << ": " << e << "\n";
*
* // Ordered by duration:
* for (MyEvents::LogEvent const& e : idx.get<Indexing::duration>())
* std::cout << getKind(e) << ": " << e << "\n";
*/
std::cout << "\nAll GEAR events ordered by event id:\n";
for (MyEvents::LogEvent const& e : make_iterator_range(idx.equal_range(make_tuple(Kind::Gear))))
std::cout << getKind(e) << ": " << e << "\n";
std::cout << "\nOnly the SLOPE events with id 10:\n";
for (MyEvents::LogEvent const& e : make_iterator_range(idx.equal_range(make_tuple(Kind::Slope, 10))))
std::cout << getKind(e) << ": " << e << "\n";
std::cout << "\nEvents with durations in [2s..3s):\n";
auto& by_dur = idx.get<Indexing::duration>();
for (MyEvents::LogEvent const& e : make_iterator_range(by_dur.lower_bound(2), by_dur.upper_bound(3)))
std::cout << getKind(e) << ": " << e << "\n";
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
namespace MyEvents {
enum class Kind { Sequence, Clutch, Gear, Slope, Other };
struct CommonFields {
boost::string_view date;
double duration;
};
struct BaseEvent {
CommonFields common;
int driver;
int event_id;
double vel;
double km;
boost::string_view road;
Kind kind;
};
struct OtherEvent {
CommonFields common;
std::string message;
};
using LogEvent = boost::variant<BaseEvent, OtherEvent>;
using LogEvents = std::vector<LogEvent>;
}
BOOST_FUSION_ADAPT_STRUCT(MyEvents::CommonFields, date, duration)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::BaseEvent, common, driver, vel, road, km, kind, event_id)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::OtherEvent, common, message)
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
namespace QiParsers {
template <typename It>
struct LogParser : qi::grammar<It, MyEvents::LogEvents()> {
using Kind = MyEvents::Kind;
LogParser() : LogParser::base_type(start) {
using namespace qi;
kind.add
("SEQUENCE", Kind::Sequence)
("CLUTCH", Kind::Clutch)
("GEAR", Kind::Gear)
("SLOPE", Kind::Slope)
;
common_fields
= '[' >> raw[
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit
] >> "]"
>> " - " >> double_ >> " s";
base_event
= common_fields
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - " >> kind >> ": " >> int_;
other_event
= common_fields
>> " => " >> *~char_("\r\n");
event
= (base_event | other_event)
>> (eol|eoi);
start = *boost::spirit::repository::qi::seek[event];
}
private:
qi::rule<It, MyEvents::LogEvents()> start;
qi::rule<It, MyEvents::LogEvent()> event;
qi::rule<It, MyEvents::CommonFields()> common_fields;
qi::rule<It, MyEvents::BaseEvent()> base_event;
qi::rule<It, MyEvents::OtherEvent()> other_event;
qi::symbols<char, MyEvents::Kind> kind;
};
}
MyEvents::LogEvents parse_spirit(It b, It e) {
static QiParsers::LogParser<It> const parser {};
MyEvents::LogEvents events;
parse(b, e, parser, events);
return events;
}
static char input[] = /* see question */;
static const size_t len = strlen(input);
namespace MyEvents { // for debug/demo
using boost::fusion::operator<<;
static inline CommonFields const& getCommon(BaseEvent const& be) { return be.common; }
static inline CommonFields const& getCommon(OtherEvent const& oe) { return oe.common; }
static inline Kind getKind(BaseEvent const& be) { return be.kind; }
static inline Kind getKind(OtherEvent const&) { return Kind::Other; }
static inline int getEventId(BaseEvent const& be) { return be.event_id; }
static inline int getEventId(OtherEvent const&) { return 0; }
#define IMPL_DISPATCH(name, T) \
struct name##Visitor : boost::static_visitor<T> { \
template <typename E> T operator()(E const &ev) const { return name(ev); } \
}; \
static inline T name(LogEvent const &ev) { return boost::apply_visitor(name##Visitor{}, ev); }
IMPL_DISPATCH(getCommon, CommonFields const&)
IMPL_DISPATCH(getKind, Kind)
IMPL_DISPATCH(getEventId, int)
static inline std::ostream& operator<<(std::ostream& os, Kind k) {
switch(k) {
case Kind::Sequence: return os << "SEQUENCE";
case Kind::Clutch: return os << "CLUTCH";
case Kind::Gear: return os << "GEAR";
case Kind::Slope: return os << "SLOPE";
case Kind::Other: return os << "(Other)";
}
return os;
}
}
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/global_fun.hpp>
namespace Indexing {
namespace bmi = boost::multi_index;
using MyEvents::LogEvent;
double getDuration(LogEvent const& ev) { return getCommon(ev).duration; }
using Table = bmi::multi_index_container<
std::reference_wrapper<LogEvent const>, //LogEvent,
bmi::indexed_by<
bmi::ordered_non_unique<
bmi::tag<struct primary>,
bmi::composite_key<
LogEvent,
bmi::global_fun<LogEvent const&, MyEvents::Kind, MyEvents::getKind>,
bmi::global_fun<LogEvent const&, int, MyEvents::getEventId>
>
>,
bmi::ordered_non_unique<
bmi::tag<struct duration>,
bmi::global_fun<LogEvent const&, double, getDuration>
>
>
>;
}
using boost::make_iterator_range;
using boost::make_tuple;
int main() {
using MyEvents::LogEvent;
using MyEvents::Kind;
auto events = parse_spirit(input, input+len);
std::cout << "Parsed: " << events.size() << " events\n";
Indexing::Table idx(events.begin(), events.end());
/*
* // To print all events, grouped by by kind and event id:
* for (MyEvents::LogEvent const& e : idx)
* std::cout << getKind(e) << ": " << e << "\n";
*
* // Ordered by duration:
* for (MyEvents::LogEvent const& e : idx.get<Indexing::duration>())
* std::cout << getKind(e) << ": " << e << "\n";
*/
std::cout << "\nAll GEAR events ordered by event id:\n";
for (MyEvents::LogEvent const& e : make_iterator_range(idx.equal_range(make_tuple(Kind::Gear))))
std::cout << getKind(e) << ": " << e << "\n";
std::cout << "\nOnly the SLOPE events with id 10:\n";
for (MyEvents::LogEvent const& e : make_iterator_range(idx.equal_range(make_tuple(Kind::Slope, 10))))
std::cout << getKind(e) << ": " << e << "\n";
std::cout << "\nEvents with durations in [2s..3s):\n";
auto& by_dur = idx.get<Indexing::duration>();
for (MyEvents::LogEvent const& e : make_iterator_range(by_dur.lower_bound(2), by_dur.upper_bound(3)))
std::cout << getKind(e) << ": " << e << "\n";
}
打印:
Parsed: 37 events
All GEAR events ordered by event id:
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0 A-11 90 GEAR 0)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0 A-11 90 GEAR 0)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0.1 A-11 90 GEAR 1)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0.1 A-11 90 GEAR 1)
GEAR: ((2018-Mar-13 13:14:03.250451 3.87) 0 1.2 B-302 90.2 GEAR 2)
GEAR: ((2018-Mar-13 13:14:03.250451 3.87) 0 1.2 B-302 90.2 GEAR 2)
GEAR: ((2018-Mar-13 13:14:04.510025 5.13) 0 4.9 B-302 91.1 GEAR 3)
Only the SLOPE events with id 10:
SLOPE: ((2018-Mar-13 13:14:04.300160 4.92) 0 4.2 B-302 90.9 SLOPE 10)
SLOPE: ((2018-Mar-13 13:14:04.300160 4.92) 0 4.2 B-302 90.9 SLOPE 10)
Events with durations in [2s..3s):
SEQUENCE: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.1 A-11 90 SEQUENCE 1)
CLUTCH: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.2 A-11 90 CLUTCH 1)
SEQUENCE: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.1 A-11 90 SEQUENCE 1)
CLUTCH: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.2 A-11 90 CLUTCH 1)
(Other): ((2018-Mar-13 13:14:01.819966 2.54) Backup to regestry)
(Other): ((2018-Mar-13 13:14:01.819966 2.54) Backup to regestry)
最近我一直在使用 boost xpressive 来解析文件。这些文件每个有 10 MB,将有数百个文件需要解析。
Xpressive 工作起来很好,语法清晰,但问题在于性能。它在调试版本中的爬行方式令人难以置信,而在发布版本中,它每个文件花费的时间超过整整一秒。我已经针对旧的普通 get_line()、find() 和 sscanf() 代码进行了测试,它可以轻松击败 xpressive。
我知道类型检查、回溯等都是有代价的,但这对我来说似乎太过分了。我怎么想,我做错了什么?有什么方法可以以适当的速度将其优化为 运行 吗?是否应该努力将代码迁移到 boost::spirit?
我准备了一个精简版代码,其中嵌入了几行真实文件,以防有人测试和提供帮助。
注意-作为一项要求,必须使用 VS 2010(遗憾的是不完全符合 c++11)
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
const char input[] = "[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: BTN-1002 - Km: 90.0 - SWITCH_ON: 1\n\
[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:14:04.510025] - 5.130 s => Driver: 0 - Speed: 4.9 - Road: B-302 - Km: 91.1 - GEAR: 3";
const auto len = std::distance(std::begin(input), std::end(input));
struct Sequence
{
int ms;
int driver;
int sequence;
double time;
double vel;
double km;
std::string date;
std::string road;
};
namespace xp = boost::xpressive;
int main()
{
Sequence data;
std::vector<Sequence> sequences;
using namespace xp;
cregex real = (+_d >> '.' >> +_d);
cregex keyword = " - SEQUENCE: " >> (+_d)[xp::ref(data.sequence) = as<int>(_)];
cregex date = repeat<4>(_d) >> '-' >> repeat<3>(alpha) >> '-' >> repeat<2>(_d) >> _s >> repeat<2>(_d) >> ':' >> repeat<2>(_d) >> ':' >> repeat<2>(_d);
cregex header = '[' >> date[xp::ref(data.date) = _] >> '.' >> (+_d)[xp::ref(data.ms) = as<int>(_)] >> "] - "
>> real[xp::ref(data.time) = as<double>(_)]
>> " s => Driver: " >> (+_d)[xp::ref(data.driver) = as<int>(_)]
>> " - Speed: " >> real[xp::ref(data.vel) = as<double>(_)]
>> " - Road: " >> (+set[alnum | '-'])[xp::ref(data.road) = _]
>> " - Km: " >> real[xp::ref(data.km) = as<double>(_)];
xp::cregex parser = (header >> keyword >> _ln);
xp::cregex_iterator cur(input, input + len, parser);
xp::cregex_iterator end;
for (; cur != end; ++cur)
sequences.emplace_back(data);
return 0;
}
请注意 VS 2010 限制。
我大致看到两个需要改进的地方:
- 你基本上解析了所有行,包括你不感兴趣的行
- 你分配了很多字符串
我建议使用字符串视图来修复分配。接下来,您可以尝试避免解析与 SEQUENCE 模式不匹配的行。原则上没有理由不能使用 Boost Xpressive 来完成,但我选择的武器恰好是 Boost Spirit,所以我也将其包括在内。
有选择性
您可以在像这样花费更多精力之前检测到有趣的线条:
cregex signature = -*~_n >> " - SEQUENCE: " >> (+_d) >> before(_ln|eos);
for (xp::cregex_iterator cur(b, e, signature), end; cur != end; ++cur) {
std::cout << "'" << cur->str() << "'\n";
}
这会打印
'[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1'
'[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4'
'[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8'
'[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15'
'[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21'
'[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29'
'[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34'
'[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45'
未分配任何内容。这应该很快。
减少分配
为此,我将切换到 Spirit,因为它会让事情变得更容易。
Note: The real reason I switched here is because, in contrast to Boost Spirit, Xpressive does not appear to have extensible attribute propagation traits. This could be my lack of experience with it.
The alternative approach would almost certainly replace the actions with manual propagation code, which in turn would inform named capture groups in order to keep things legible. I'm not sure about the performance overhead of these, so let's not use them at this point.
您可以使用 boost::string_view
和 "teach" Qi 的特征来为其分配文本:
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
这样,Qi 文法可能看起来像这样:
template <typename It> struct QiParser : qi::grammar<It, Sequence()> {
QiParser() : QiParser::base_type(line) {
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
line = '[' >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - SEQUENCE: " >> int_
>> (eol|eoi);
}
private:
qi::rule<It, Sequence()> line;
};
使用起来非常简单,特别是如果不是"selective"。
This happens to be the "winning" configuration. Here's the standalone, simplified version of that algorithm after removing all benchmark-related generics and options: Live on Coliru
基准测试结果:惊喜
使用选择性解析方法只会使 Xpressive 方法变慢:Interactive
与 Spirit 相比,我最初也是从选择性方法开始的(完全预计它会更快)。这是 not-so-encouraging 结果:Interactive
糟糕。最初的 Xpressive 方法仍然优越!
调整假设
好吧,很明显首先进行浅扫描,然后 "full parse" 会影响性能。从理论上讲,这可能归结为 cache/prefetch 效应。此外,线性方法可能会获胜,因为它更容易发现一行不以 '['
字符开头,而不是查看它是否 结束 与 SEQUENCE
图案。
所以我决定将精神方法也适应线性模式,看看减少分配的胜利是否仍然值得:Interactive
现在我们得到了结果。让我们详细看看 std::string
和 boost::string_view
方法之间的区别: Interactive
Summary/Conclusions
减少分配有利于 效率提高 30%。总的来说,比原来的方法改进了 10 倍。
请注意,基准代码竭尽全力消除实现之间的不公平差异(例如,通过在 Spirit 和 Xpressive 上预编译所有内容)。查看完整的基准代码:
The winning implementation in isolation: Live on Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
struct Sequence {
int driver;
int sequence;
double time;
double vel;
double km;
boost::string_view date;
boost::string_view road;
};
BOOST_FUSION_ADAPT_STRUCT(::Sequence, date, time, driver, vel, road, km, sequence)
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
std::vector<Sequence> parse_spirit(It b, It e) {
qi::rule<It, Sequence()> static const line = []{
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
qi::rule<It, Sequence()> r = '[' >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - SEQUENCE: " >> int_
>> (eol|eoi);
return r;
}();
std::vector<Sequence> sequences;
parse(b, e, *boost::spirit::repository::qi::seek[line], sequences);
return sequences;
}
static char input[] = /*... see question ...*/;
static const size_t len = strlen(input);
int main() {
auto sequences = parse_spirit(input, input+len);
std::cout << "Parsed: " << sequences.size() << " sequence lines\n";
}
完整基准代码
基准使用 Nonius 进行测量和统计分析。
- 这里有完整的交互式图表:http://Whosebug-sehe.s3.amazonaws.com/9f88e055-4b5f-4026-8f2f-54e2bcad430d/stats.html
- 如果有 Nonius 可用,请使用
-DUSE_NONIUS
编译 - 为 "correctness" 模式使用
-DVERIFY_OUTPUT
编译:在这种情况下,不进行任何计时,但会回显解析结果以进行验证
#include <cstring> // strlen
static char input[] =
"[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:14:04.510025] - 5.130 s => Driver: 0 - Speed: 4.9 - Road: B-302 - Km: 91.1 - GEAR: 3";
static const size_t len = strlen(input);
#include <boost/utility/string_view.hpp>
#include <boost/fusion/adapted/struct.hpp>
template <typename String> struct Sequence {
int driver;
int sequence;
double time;
double vel;
double km;
String date;
String road;
};
BOOST_FUSION_ADAPT_TPL_STRUCT((T),(Sequence)(T), date, time, driver, vel, road, km, sequence)
// Declare implementations under test:
using It = char const*;
template <typename S> std::vector<S> parse_xpressive_linear(It b, It e);
template <typename S> std::vector<S> parse_xpressive_selective(It b, It e);
template <typename S> std::vector<S> parse_spirit_linear(It b, It e);
template <typename S> std::vector<S> parse_spirit_selective(It b, It e);
#ifdef VERIFY_OUTPUT
#include <boost/fusion/include/io.hpp>
using boost::fusion::operator<<;
#include <iostream>
#define VERIFY() \
do { \
std::cout << "L:" << __LINE__ << " Parsed: " << sequences.size() << "\n"; \
for (auto r : sequences) { \
std::cout << r << "\n"; \
} \
} while (0)
#else
#define VERIFY() do { } while (0)
#endif
#ifdef USE_NONIUS
#include <nonius/benchmark.h++>
#define NONIUS_RUNNER
#include <nonius/main.h++>
#else
// mock nonius
namespace nonius {
struct chronometer{
template <typename F> static inline void measure(F&& f) { std::forward<F>(f)(); }
};
static std::vector<std::function<void(chronometer)>> s_benchmarks;
#define TOKENPASTE(x, y) x ## y
#define TOKENPASTE2(x, y) TOKENPASTE(x, y)
#define NONIUS_BENCHMARK(name, f) static auto TOKENPASTE2(s_reg_, __LINE__) = []{ ::nonius::s_benchmarks.push_back(f); return 42; }();
void run() { for (auto& b : s_benchmarks) b({}); }
}
int main() {
nonius::run();
}
#endif
template <typename R>
void do_test_kernel(nonius::chronometer& cm, std::vector<R> (*f)(It, It)) {
std::vector<R> sequences;
cm.measure([&sequences,f]{ sequences = f(input, input + len); });
VERIFY();
}
#define TEST_CASE(name, string) NONIUS_BENCHMARK(#name"-"#string, [](nonius::chronometer cm) { do_test_kernel(cm, &name<Sequence<string> >); })
// Xpressive doesn't support string_view
TEST_CASE(parse_xpressive_linear, std::string)
TEST_CASE(parse_xpressive_selective, std::string)
TEST_CASE(parse_spirit_linear, std::string)
TEST_CASE(parse_spirit_linear, boost::string_view)
TEST_CASE(parse_spirit_selective, std::string)
TEST_CASE(parse_spirit_selective, boost::string_view)
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
namespace xp = boost::xpressive;
namespace XpressiveDetail {
using namespace xp;
struct Scanner {
cregex scan {-*~xp::_n >> " - SEQUENCE: " >> (+xp::_d) >> xp::_ln};
};
template <typename Seq> struct Parser : Scanner {
mutable Seq seq; // non-thread-safe, but fairer to compare to Spirit
cregex real = (+_d >> '.' >> +_d);
cregex keyword = " - SEQUENCE: " >> (+_d)[xp::ref(seq.sequence) = as<int>(_)];
cregex date = repeat<4>(_d) >> '-'
>> repeat<3>(alpha) >> '-'
>> repeat<2>(_d)
>> _s
>> repeat<2>(_d) >> ':'
>> repeat<2>(_d) >> ':'
>> repeat<2>(_d)
>> '.' >> (+_d);
cregex header = '[' >> date[xp::ref(seq.date) = _] >> "] - "
>> real[xp::ref(seq.time) = as<double>(_)]
>> " s => Driver: " >> (+_d) [ xp ::ref(seq.driver) = as<int>(_) ]
>> " - Speed: " >> real [ xp ::ref(seq.vel) = as<double>(_) ]
>> " - Road: " >> (+set[alnum|'-']) [ xp ::ref(seq.road) = _ ]
>> " - Km: " >> real [ xp ::ref(seq.km) = as<double>(_) ];
cregex parser = (header >> keyword >> _ln);
};
}
template <typename Seq>
std::vector<Seq> parse_xpressive_linear(It b, It e) {
std::vector<Seq> sequences;
using namespace xp;
static const XpressiveDetail::Parser<Seq> precompiled{};
for (xp::cregex_iterator cur(b, e, precompiled.parser), end; cur != end; ++cur)
sequences.push_back(std::move(precompiled.seq));
return sequences;
}
template <typename Seq>
std::vector<Seq> parse_xpressive_selective(It b, It e) {
std::vector<Seq> sequences;
using namespace xp;
static const XpressiveDetail::Parser<Seq> precompiled{};
xp::match_results<It> m;
for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {
if (xp::regex_match(match[0].first, match[0].second, m, precompiled.parser))
sequences.push_back(std::move(precompiled.seq));
}
return sequences;
}
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
template <typename It, typename Attribute> struct QiParser : qi::grammar<It, Attribute()> {
QiParser() : QiParser::base_type(line) {
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
line = '[' >> eps(clear(_val)) >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - SEQUENCE: " >> int_
>> (eol|eoi);
BOOST_SPIRIT_DEBUG_NODES((line))
}
private:
struct clear_f {
// only required for linear approach to std::string-based
bool operator()(Sequence<std::string>& v) const { v = {}; return true; }
bool operator()(Sequence<boost::string_view>&) const { /*no_op();*/ return true; }
};
boost::phoenix::function<clear_f> clear;
qi::rule<It, Attribute()> line;
};
template <typename Seq = Sequence<std::string> >
std::vector<Seq> parse_spirit_selective(It b, It e) {
static QiParser<It, Seq> const qi_parser{};
static XpressiveDetail::Scanner const precompiled{};
std::vector<Seq> sequences;
for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {
Seq r;
if (parse(match[0].first, match[0].second, qi_parser, r))
sequences.push_back(r);
}
return sequences;
}
#include <boost/spirit/repository/include/qi_seek.hpp>
template <typename Seq = Sequence<std::string> >
std::vector<Seq> parse_spirit_linear(It b, It e) {
using boost::spirit::repository::qi::seek;
static QiParser<It, Seq> const qi_parser{};
std::vector<Seq> sequences;
parse(b, e, *seek[qi_parser], sequences);
return sequences;
}
示例文本报告:
clock resolution: mean is 17.7534 ns (40960002 iterations)
benchmarking parse_xpressive_linear-std::string
collecting 100 samples, 1 iterations each, in estimated 15.7252 ms
mean: 156.418 μs, lb 155.863 μs, ub 158.24 μs, ci 0.95
std dev: 4.62848 μs, lb 1637.89 ns, ub 10.4043 μs, ci 0.95
found 4 outliers among 100 samples (4%)
variance is moderately inflated by outliers
benchmarking parse_xpressive_selective-std::string
collecting 100 samples, 1 iterations each, in estimated 31.5459 ms
mean: 313.992 μs, lb 313.39 μs, ub 315.599 μs, ci 0.95
std dev: 4.5415 μs, lb 1105.98 ns, ub 9.07809 μs, ci 0.95
found 11 outliers among 100 samples (11%)
variance is slightly inflated by outliers
benchmarking parse_spirit_linear-std::string
collecting 100 samples, 1 iterations each, in estimated 2.1556 ms
mean: 21.2533 μs, lb 21.1623 μs, ub 21.6854 μs, ci 0.95
std dev: 870.481 ns, lb 53.2809 ns, ub 2.0738 μs, ci 0.95
found 7 outliers among 100 samples (7%)
variance is moderately inflated by outliers
benchmarking parse_spirit_linear-boost::string_view
collecting 100 samples, 2 iterations each, in estimated 2.944 ms
mean: 14.6677 μs, lb 14.6342 μs, ub 14.8279 μs, ci 0.95
std dev: 318.252 ns, lb 22.5097 ns, ub 757.555 ns, ci 0.95
found 5 outliers among 100 samples (5%)
variance is moderately inflated by outliers
benchmarking parse_spirit_selective-std::string
collecting 100 samples, 1 iterations each, in estimated 27.5512 ms
mean: 273.052 μs, lb 272.77 μs, ub 273.952 μs, ci 0.95
std dev: 2.31473 μs, lb 835.184 ns, ub 5.1322 μs, ci 0.95
found 10 outliers among 100 samples (10%)
variance is unaffected by outliers
benchmarking parse_spirit_selective-boost::string_view
collecting 100 samples, 1 iterations each, in estimated 27.0766 ms
mean: 269.446 μs, lb 269.208 μs, ub 270.268 μs, ci 0.95
std dev: 2.01634 μs, lb 627.834 ns, ub 4.56949 μs, ci 0.95
found 10 outliers among 100 samples (10%)
variance is unaffected by outliers
你可以使用具有精神特征的融合(参见示例parsing into several vector members),但我会考虑使用语义动作。
这是设计难题:
vector
具有特征
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
struct BaseEvent {
int driver;
int sequence;
double time;
double vel;
double km;
boost::string_view date;
boost::string_view road;
};
struct Sequence : BaseEvent{};
struct Clutch : BaseEvent{};
struct Gear : BaseEvent{};
BOOST_FUSION_ADAPT_STRUCT(::Sequence, date, time, driver, vel, road, km, sequence)
BOOST_FUSION_ADAPT_STRUCT(::Clutch, date, time, driver, vel, road, km, sequence)
BOOST_FUSION_ADAPT_STRUCT(::Gear, date, time, driver, vel, road, km, sequence)
struct LogEvents {
std::vector<Sequence> sequence;
std::vector<Clutch> clutch;
std::vector<Gear> gear;
void add(Sequence const& s) { sequence.push_back(s); }
void add(Clutch const& c) { clutch.push_back(c); }
void add(Gear const& g) { gear.push_back(g); }
};
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
template <> struct is_container<LogEvents> : std::true_type {};
template <> struct container_value<LogEvents> {
using type = boost::variant<::Sequence, ::Clutch, ::Gear>;
};
template <typename T> struct push_back_container<LogEvents, T> {
struct Visitor {
LogEvents& _log;
template <typename U> void operator()(U const& ev) const { _log.add(ev); }
using result_type = void;
};
template <typename... U>
static bool call(LogEvents& log, boost::variant<U...> const& attribute) {
boost::apply_visitor(Visitor{log}, attribute);
return true;
}
};
} } }
namespace QiParsers {
template <typename It, typename Attribute>
struct BaseEventParser : qi::grammar<It, Attribute()> {
BaseEventParser(std::string const& event_type) : BaseEventParser::base_type(start) {
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
start
= '[' >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - " >> lit(event_type) >> ": " >> int_
>> (eol|eoi);
}
private:
qi::rule<It, Attribute()> start;
};
}
LogEvents parse_spirit(It b, It e) {
QiParsers::BaseEventParser<It, ::Sequence> sequence("SEQUENCE");
QiParsers::BaseEventParser<It, ::Clutch> clutch("CLUTCH");
QiParsers::BaseEventParser<It, ::Gear> gear("GEAR");
LogEvents events;
assert(parse(b, e, *boost::spirit::repository::qi::seek[sequence|clutch|gear], events));
return events;
}
static char input[] = /* see question */;
static const size_t len = strlen(input);
int main() {
auto events = parse_spirit(input, input+len);
std::cout << "Events: "
<< events.sequence.size() << " sequence, "
<< events.clutch.size() << " clutch, "
<< events.gear.size() << " gear events\n";
using boost::fusion::operator<<;
for (auto& s : events.sequence) { std::cout << "SEQUENCE: " << s << "\n"; }
for (auto& c : events.clutch) { std::cout << "CLUTCH: " << c << "\n"; }
for (auto& g : events.gear) { std::cout << "GEAR: " << g << "\n"; }
}
翻转它:1 vector<variant<>>
使用变体向量不是更有意义吗?
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
namespace MyEvents {
struct BaseEvent {
int driver;
int sequence;
double time;
double vel;
double km;
boost::string_view date;
boost::string_view road;
};
struct Sequence : BaseEvent{};
struct Clutch : BaseEvent{};
struct Gear : BaseEvent{};
using LogEvent = boost::variant<Sequence, Clutch, Gear>;
using LogEvents = std::vector<LogEvent>;
}
BOOST_FUSION_ADAPT_STRUCT(MyEvents::Sequence, date, time, driver, vel, road, km, sequence)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::Clutch, date, time, driver, vel, road, km, sequence)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::Gear, date, time, driver, vel, road, km, sequence)
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
namespace QiParsers {
template <typename It, typename Attribute>
struct BaseEventParser : qi::grammar<It, Attribute()> {
BaseEventParser(std::string const& event_type) : BaseEventParser::base_type(start) {
using namespace qi;
auto date_time = copy(
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);
start
= '[' >> raw[date_time] >> "] - "
>> double_ >> " s"
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - " >> lit(event_type) >> ": " >> int_
>> (eol|eoi);
}
private:
qi::rule<It, Attribute()> start;
};
template <typename It>
struct LogParser : qi::grammar<It, MyEvents::LogEvents()> {
LogParser() : LogParser::base_type(start) {
using namespace qi;
using boost::spirit::repository::qi::seek;
event = sequence | clutch | gear ; // TODO add types
start = *seek[event];
}
private:
qi::rule<It, MyEvents::LogEvents()> start;
qi::rule<It, MyEvents::LogEvent()> event;
BaseEventParser<It, MyEvents::Sequence> sequence{"SEQUENCE"};
BaseEventParser<It, MyEvents::Clutch> clutch{"CLUTCH"};
BaseEventParser<It, MyEvents::Gear> gear{"GEAR"};
};
}
MyEvents::LogEvents parse_spirit(It b, It e) {
static QiParsers::LogParser<It> const parser {};
MyEvents::LogEvents events;
parse(b, e, parser, events);
return events;
}
static char input[] = /* see question */;
static const size_t len = strlen(input);
namespace MyEvents { // for debug/demo
using boost::fusion::operator<<;
static inline char const* kind(Sequence const&) { return "SEQUENCE"; }
static inline char const* kind(Clutch const&) { return "CLUTCH"; }
static inline char const* kind(Gear const&) { return "GEAR"; }
struct KindVisitor : boost::static_visitor<char const*> {
template <typename T> char const* operator()(T const& ev) const { return kind(ev); }
};
static inline char const* kind(LogEvent const& ev) {
return boost::apply_visitor(KindVisitor{}, ev);
}
}
int main() {
auto events = parse_spirit(input, input+len);
std::cout << "Parsed: " << events.size() << " events\n";
for (auto& e : events)
std::cout << kind(e) << ": " << e << "\n";
}
概括:公共字段和其他事件
特别是如果你继续概括:
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
namespace MyEvents {
enum class Kind { Sequence, Clutch, Gear, Slope, Other };
struct CommonFields {
boost::string_view date;
double duration;
};
struct BaseEvent {
CommonFields common;
int driver;
int event_id;
double vel;
double km;
boost::string_view road;
Kind kind;
};
struct OtherEvent {
CommonFields common;
std::string message;
};
using LogEvent = boost::variant<BaseEvent, OtherEvent>;
using LogEvents = std::vector<LogEvent>;
}
BOOST_FUSION_ADAPT_STRUCT(MyEvents::CommonFields, date, duration)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::BaseEvent, common, driver, vel, road, km, kind, event_id)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::OtherEvent, common, message)
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
namespace QiParsers {
template <typename It>
struct LogParser : qi::grammar<It, MyEvents::LogEvents()> {
using Kind = MyEvents::Kind;
LogParser() : LogParser::base_type(start) {
using namespace qi;
kind.add
("SEQUENCE", Kind::Sequence)
("CLUTCH", Kind::Clutch)
("GEAR", Kind::Gear)
("SLOPE", Kind::Slope)
;
common_fields
= '[' >> raw[
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit
] >> "]"
>> " - " >> double_ >> " s";
base_event
= common_fields
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - " >> kind >> ": " >> int_;
other_event
= common_fields
>> " => " >> *~char_("\r\n");
event
= (base_event | other_event)
>> (eol|eoi);
start = *boost::spirit::repository::qi::seek[event];
}
private:
qi::rule<It, MyEvents::LogEvents()> start;
qi::rule<It, MyEvents::LogEvent()> event;
qi::rule<It, MyEvents::CommonFields()> common_fields;
qi::rule<It, MyEvents::BaseEvent()> base_event;
qi::rule<It, MyEvents::OtherEvent()> other_event;
qi::symbols<char, MyEvents::Kind> kind;
};
}
MyEvents::LogEvents parse_spirit(It b, It e) {
static QiParsers::LogParser<It> const parser {};
MyEvents::LogEvents events;
parse(b, e, parser, events);
return events;
}
static char input[] = /* see question */;
static const size_t len = strlen(input);
namespace MyEvents { // for debug/demo
using boost::fusion::operator<<;
static inline Kind getKind(BaseEvent const& be) { return be.kind; }
static inline Kind getKind(OtherEvent const&) { return Kind::Other; }
struct KindVisitor : boost::static_visitor<Kind> {
template <typename T> Kind operator()(T const& ev) const { return getKind(ev); }
};
static inline Kind getKind(LogEvent const& ev) {
return boost::apply_visitor(KindVisitor{}, ev);
}
static inline std::ostream& operator<<(std::ostream& os, Kind k) {
switch(k) {
case Kind::Sequence: return os << "SEQUENCE";
case Kind::Clutch: return os << "CLUTCH";
case Kind::Gear: return os << "GEAR";
case Kind::Slope: return os << "SLOPE";
case Kind::Other: return os << "(Other)";
}
return os;
}
}
int main() {
auto events = parse_spirit(input, input+len);
std::cout << "Parsed: " << events.size() << " events\n";
for (auto& e : events)
std::cout << getKind(e) << ": " << e << "\n";
}
打印例如
Parsed: 37 events
SLOPE: ((2018-Mar-13 13:13:59.580482 0.2) 0 0 A-11 90 SLOPE 0)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0 A-11 90 GEAR 0)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0.1 A-11 90 GEAR 1)
SEQUENCE: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.1 A-11 90 SEQUENCE 1)
CLUTCH: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.2 A-11 90 CLUTCH 1)
(Other): ((2018-Mar-13 13:14:01.819966 2.54) Backup to regestry)
[...]
奖金:Multi-Index
如果您使用 multi-index 个容器,您也可以吃蛋糕。
这是一个示例定义,它允许您根据一些相当任意选择的特征来索引向量:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/global_fun.hpp>
namespace Indexing {
namespace bmi = boost::multi_index;
using MyEvents::LogEvent;
double getDuration(LogEvent const& ev) { return getCommon(ev).duration; }
using Table = bmi::multi_index_container<
std::reference_wrapper<LogEvent const>, //LogEvent,
bmi::indexed_by<
bmi::ordered_non_unique<
bmi::tag<struct primary>,
bmi::composite_key<
LogEvent,
bmi::global_fun<LogEvent const&, MyEvents::Kind, MyEvents::getKind>,
bmi::global_fun<LogEvent const&, int, MyEvents::getEventId>
>
>,
bmi::ordered_non_unique<
bmi::tag<struct duration>,
bmi::global_fun<LogEvent const&, double, getDuration>
>
>
>;
}
现在您可以做一些有趣的事情,例如:
Indexing::Table idx(events.begin(), events.end());
/*
* // To print all events, grouped by by kind and event id:
* for (MyEvents::LogEvent const& e : idx)
* std::cout << getKind(e) << ": " << e << "\n";
*
* // Ordered by duration:
* for (MyEvents::LogEvent const& e : idx.get<Indexing::duration>())
* std::cout << getKind(e) << ": " << e << "\n";
*/
std::cout << "\nAll GEAR events ordered by event id:\n";
for (MyEvents::LogEvent const& e : make_iterator_range(idx.equal_range(make_tuple(Kind::Gear))))
std::cout << getKind(e) << ": " << e << "\n";
std::cout << "\nOnly the SLOPE events with id 10:\n";
for (MyEvents::LogEvent const& e : make_iterator_range(idx.equal_range(make_tuple(Kind::Slope, 10))))
std::cout << getKind(e) << ": " << e << "\n";
std::cout << "\nEvents with durations in [2s..3s):\n";
auto& by_dur = idx.get<Indexing::duration>();
for (MyEvents::LogEvent const& e : make_iterator_range(by_dur.lower_bound(2), by_dur.upper_bound(3)))
std::cout << getKind(e) << ": " << e << "\n";
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen
using It = char const*;
namespace MyEvents {
enum class Kind { Sequence, Clutch, Gear, Slope, Other };
struct CommonFields {
boost::string_view date;
double duration;
};
struct BaseEvent {
CommonFields common;
int driver;
int event_id;
double vel;
double km;
boost::string_view road;
Kind kind;
};
struct OtherEvent {
CommonFields common;
std::string message;
};
using LogEvent = boost::variant<BaseEvent, OtherEvent>;
using LogEvents = std::vector<LogEvent>;
}
BOOST_FUSION_ADAPT_STRUCT(MyEvents::CommonFields, date, duration)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::BaseEvent, common, driver, vel, road, km, kind, event_id)
BOOST_FUSION_ADAPT_STRUCT(MyEvents::OtherEvent, common, message)
namespace qi = boost::spirit::qi;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
};
} } }
namespace QiParsers {
template <typename It>
struct LogParser : qi::grammar<It, MyEvents::LogEvents()> {
using Kind = MyEvents::Kind;
LogParser() : LogParser::base_type(start) {
using namespace qi;
kind.add
("SEQUENCE", Kind::Sequence)
("CLUTCH", Kind::Clutch)
("GEAR", Kind::Gear)
("SLOPE", Kind::Slope)
;
common_fields
= '[' >> raw[
repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >>
repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit
] >> "]"
>> " - " >> double_ >> " s";
base_event
= common_fields
>> " => Driver: " >> int_
>> " - Speed: " >> double_
>> " - Road: " >> raw[+graph]
>> " - Km: " >> double_
>> " - " >> kind >> ": " >> int_;
other_event
= common_fields
>> " => " >> *~char_("\r\n");
event
= (base_event | other_event)
>> (eol|eoi);
start = *boost::spirit::repository::qi::seek[event];
}
private:
qi::rule<It, MyEvents::LogEvents()> start;
qi::rule<It, MyEvents::LogEvent()> event;
qi::rule<It, MyEvents::CommonFields()> common_fields;
qi::rule<It, MyEvents::BaseEvent()> base_event;
qi::rule<It, MyEvents::OtherEvent()> other_event;
qi::symbols<char, MyEvents::Kind> kind;
};
}
MyEvents::LogEvents parse_spirit(It b, It e) {
static QiParsers::LogParser<It> const parser {};
MyEvents::LogEvents events;
parse(b, e, parser, events);
return events;
}
static char input[] = /* see question */;
static const size_t len = strlen(input);
namespace MyEvents { // for debug/demo
using boost::fusion::operator<<;
static inline CommonFields const& getCommon(BaseEvent const& be) { return be.common; }
static inline CommonFields const& getCommon(OtherEvent const& oe) { return oe.common; }
static inline Kind getKind(BaseEvent const& be) { return be.kind; }
static inline Kind getKind(OtherEvent const&) { return Kind::Other; }
static inline int getEventId(BaseEvent const& be) { return be.event_id; }
static inline int getEventId(OtherEvent const&) { return 0; }
#define IMPL_DISPATCH(name, T) \
struct name##Visitor : boost::static_visitor<T> { \
template <typename E> T operator()(E const &ev) const { return name(ev); } \
}; \
static inline T name(LogEvent const &ev) { return boost::apply_visitor(name##Visitor{}, ev); }
IMPL_DISPATCH(getCommon, CommonFields const&)
IMPL_DISPATCH(getKind, Kind)
IMPL_DISPATCH(getEventId, int)
static inline std::ostream& operator<<(std::ostream& os, Kind k) {
switch(k) {
case Kind::Sequence: return os << "SEQUENCE";
case Kind::Clutch: return os << "CLUTCH";
case Kind::Gear: return os << "GEAR";
case Kind::Slope: return os << "SLOPE";
case Kind::Other: return os << "(Other)";
}
return os;
}
}
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/global_fun.hpp>
namespace Indexing {
namespace bmi = boost::multi_index;
using MyEvents::LogEvent;
double getDuration(LogEvent const& ev) { return getCommon(ev).duration; }
using Table = bmi::multi_index_container<
std::reference_wrapper<LogEvent const>, //LogEvent,
bmi::indexed_by<
bmi::ordered_non_unique<
bmi::tag<struct primary>,
bmi::composite_key<
LogEvent,
bmi::global_fun<LogEvent const&, MyEvents::Kind, MyEvents::getKind>,
bmi::global_fun<LogEvent const&, int, MyEvents::getEventId>
>
>,
bmi::ordered_non_unique<
bmi::tag<struct duration>,
bmi::global_fun<LogEvent const&, double, getDuration>
>
>
>;
}
using boost::make_iterator_range;
using boost::make_tuple;
int main() {
using MyEvents::LogEvent;
using MyEvents::Kind;
auto events = parse_spirit(input, input+len);
std::cout << "Parsed: " << events.size() << " events\n";
Indexing::Table idx(events.begin(), events.end());
/*
* // To print all events, grouped by by kind and event id:
* for (MyEvents::LogEvent const& e : idx)
* std::cout << getKind(e) << ": " << e << "\n";
*
* // Ordered by duration:
* for (MyEvents::LogEvent const& e : idx.get<Indexing::duration>())
* std::cout << getKind(e) << ": " << e << "\n";
*/
std::cout << "\nAll GEAR events ordered by event id:\n";
for (MyEvents::LogEvent const& e : make_iterator_range(idx.equal_range(make_tuple(Kind::Gear))))
std::cout << getKind(e) << ": " << e << "\n";
std::cout << "\nOnly the SLOPE events with id 10:\n";
for (MyEvents::LogEvent const& e : make_iterator_range(idx.equal_range(make_tuple(Kind::Slope, 10))))
std::cout << getKind(e) << ": " << e << "\n";
std::cout << "\nEvents with durations in [2s..3s):\n";
auto& by_dur = idx.get<Indexing::duration>();
for (MyEvents::LogEvent const& e : make_iterator_range(by_dur.lower_bound(2), by_dur.upper_bound(3)))
std::cout << getKind(e) << ": " << e << "\n";
}
打印:
Parsed: 37 events
All GEAR events ordered by event id:
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0 A-11 90 GEAR 0)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0 A-11 90 GEAR 0)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0.1 A-11 90 GEAR 1)
GEAR: ((2018-Mar-13 13:14:01.170203 1.79) 0 0.1 A-11 90 GEAR 1)
GEAR: ((2018-Mar-13 13:14:03.250451 3.87) 0 1.2 B-302 90.2 GEAR 2)
GEAR: ((2018-Mar-13 13:14:03.250451 3.87) 0 1.2 B-302 90.2 GEAR 2)
GEAR: ((2018-Mar-13 13:14:04.510025 5.13) 0 4.9 B-302 91.1 GEAR 3)
Only the SLOPE events with id 10:
SLOPE: ((2018-Mar-13 13:14:04.300160 4.92) 0 4.2 B-302 90.9 SLOPE 10)
SLOPE: ((2018-Mar-13 13:14:04.300160 4.92) 0 4.2 B-302 90.9 SLOPE 10)
Events with durations in [2s..3s):
SEQUENCE: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.1 A-11 90 SEQUENCE 1)
CLUTCH: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.2 A-11 90 CLUTCH 1)
SEQUENCE: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.1 A-11 90 SEQUENCE 1)
CLUTCH: ((2018-Mar-13 13:14:01.819966 2.44) 0 0.2 A-11 90 CLUTCH 1)
(Other): ((2018-Mar-13 13:14:01.819966 2.54) Backup to regestry)
(Other): ((2018-Mar-13 13:14:01.819966 2.54) Backup to regestry)