c ++ boost base64解码器在存在换行符时失败
c++ boost base64 decoder fails when newlines are present
当给定 base64 'text' 解码时,包含新行,以下将引发异常 - 存在非 base64 字符。引用新行。
在抛出 'boost::archive::iterators::dataflow_exception' 的实例后调用终止
what(): 尝试解码不在 base64 字符集中的值
有谁知道如何让 boost 优雅地处理换行符?我知道我可以在解码之前自己将它们从字符串中删除,但我希望并猜测有一种更简化的方法。
typedef transform_width< binary_from_base64<remove_whitespace<char*>>, 8, 6 > base64_dec;
unsigned int size = s.size(); //where 's' is the string holding the base64 characters to include newlines at every 76th character
std::string decoded_token(base64_dec(s.c_str()), base64_dec(s.c_str() + size));
好的,问题在于换行符不是问题。 filter_iterator
从根本上被打破了。
一旦输入序列以不满足过滤谓词的字符(在本例中为空白字符)结束,就会导致Undefined Behaviour:
Crashing Live On Compiler Explorer
#include <boost/archive/iterators/remove_whitespace.hpp>
#include <iomanip>
#include <iostream>
namespace bai = boost::archive::iterators;
int main() {
using It = bai::remove_whitespace<const char*>;
std::string const s = "oops "; // ends in whitespace, causes UB
std::string filtered(It(s.c_str()), It(s.c_str() + s.length()));
std::cout << std::quoted(filtered) << std::flush;
}
打印,启用 ASan(没有它只是段错误):
=================================================================
==1==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffecf132b0 at pc 0x00000040390d bp 0x7fffecf12fd0 sp 0x7fffecf12fc8
READ of size 1 at 0x7fffecf132b0 thread T0
#0 0x40390c in dereference_impl /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:105
#1 0x40390c in dereference /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:113
#2 0x40390c in dereference<boost::archive::iterators::filter_iterator<(anonymous namespace)::remove_whitespace_predicate<char>, char const*> > /opt/compiler-explorer/libs/boost_1_78_0/boost/iterator/iterator_facade.hpp:550
#3 0x40390c in operator* /opt/compiler-explorer/libs/boost_1_78_0/boost/iterator/iterator_facade.hpp:656
#4 0x40390c in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<boost::archive::iterators::remove_whitespace<char const*> >(boost::archive::iterators::remove_whitespace<char const*>, boost::archive::iterators::remove_whitespace<char const*>, std::input_iterator_tag) /opt/compiler-explorer/gcc-trunk-20220419/include/c++/12.0.1/bits/basic_string.tcc:204
#5 0x40390c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<boost::archive::iterators::remove_whitespace<char const*>, void>(boost::archive::iterators::remove_whitespace<char const*>, boost::archive::iterators::remove_whitespace<char const*>, std::allocator<char> const&) /opt/compiler-explorer/gcc-trunk-20220419/include/c++/12.0.1/bits/basic_string.h:756
#6 0x40390c in main /app/example.cpp:11
#7 0x7fb625b560b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
#8 0x4041ed in _start (/app/output.s+0x4041ed)
Address 0x7fffecf132b0 is located in stack of thread T0 at offset 576 in frame
#0 0x40245f in main /app/example.cpp:7
This frame has 21 object(s):
[32, 33) '<unknown>'
[48, 49) '<unknown>'
[64, 65) '<unknown>'
[80, 81) '<unknown>'
[96, 97) '<unknown>'
[112, 113) '<unknown>'
[128, 136) 'start'
[160, 168) 'start'
[192, 200) '__guard'
[224, 232) 'start'
[256, 264) 'start'
[288, 296) '__capacity'
[320, 328) '__guard'
[352, 368) '<unknown>'
[384, 400) '<unknown>'
[416, 432) '<unknown>'
[448, 464) '<unknown>'
[480, 496) '<unknown>'
[512, 528) '<unknown>'
[544, 576) 's' (line 9) <== Memory access at offset 576 overflows this variable
[608, 640) 'filtered' (line 11)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:105 in dereference_impl
Shadow bytes around the buggy address:
0x10007d9da600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
0x10007d9da610: f1 f1 f8 f2 01 f2 f8 f2 f8 f2 f8 f2 01 f2 00 f2
0x10007d9da620: f2 f2 00 f2 f2 f2 f8 f2 f2 f2 00 f2 f2 f2 00 f2
0x10007d9da630: f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 00 f2 f2 00 00
0x10007d9da640: f2 f2 00 00 f2 f2 00 00 f2 f2 00 00 f2 f2 00 00
=>0x10007d9da650: f2 f2 00 00 00 00[f2]f2 f2 f2 00 00 00 00 f3 f3
0x10007d9da660: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1==ABORTING
您可以认为自己很幸运,因为症状而注意到您,而不是它吃掉您的小狗或在生产代码中发射核武器。
这应该被报告。没有单独测试 filter_iterator
(甚至 remove_whitespace
),之前的票似乎表明了一种立场,如“它对我有用”(其中“我”是 Boost 序列化库).参见例如https://github.com/boostorg/serialization/issues/135.
Interestingly the analysis at that ticket makes zero sense, since there hasn't been a two-iterator constructor for filter_iterator
since ... forever. I can only guess that Robert mistakenly looked at Boost Iterator instead of the filter_iterator
from Boost Archive.
所以我的直觉是建议您使用 Boost Iterator 中的 filter_iterator.hpp
反而。具有讽刺意味的是,它尝试了几次(并去了 cppslack/github)
没错,但这里是:Live On Compiler Explorer
使用 Boost 迭代器修复 filter_iterator
我们应该能够使用有效的 filter_iterator
实现来修复:
using FiltIt =
boost::iterators::filter_iterator<IsGraph, std::string::const_iterator>;
using base64_dec = //
bai::transform_width< //
bai::binary_from_base64<FiltIt>, 8, 6>; //
现在,要做到这一点仍然 棘手 。值得注意的是,天真的方法将再次因 UB 而失败:
// CAUTION: this invokes UB:
std::string filtered(base64_dec(s.begin()), base64_dec(s.end()));
这是隐式转换 + 默认参数的诅咒。相反,我们必须 单独显式构造 FiltIt:
FiltIt f(IsGraph{}, s.begin(), s.end()), // !!
l(f.predicate(), f.end(), f.end()); // !!
现在我们可以“只”在 base64_dec 中使用它们:
std::string filtered(base64_dec{f}, base64_dec{l});
Note the uniform {} initializers to sidestep Most Vexing Parse
#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/transform_width.hpp>
#include <boost/iterator/filter_iterator.hpp>
#include <iomanip>
#include <iostream>
namespace bai = boost::archive::iterators;
static std::string const s = "aGVsbG8g\nd29ybGQK";
int main() {
std::cout << std::unitbuf;
struct IsGraph {
// unsigned char prevents sign extension
bool operator()(unsigned char ch) const {
return std::isgraph(ch); // !std::isspace
}
};
using FiltIt =
boost::iterators::filter_iterator<IsGraph, std::string::const_iterator>;
using base64_dec = //
bai::transform_width< //
bai::binary_from_base64<FiltIt>, 8, 6>; //
//// CAUTION: this invokes UB:
//std::string filtered(base64_dec(s.begin()), base64_dec(s.end()));
FiltIt f(IsGraph{}, s.begin(), s.end()), // !!
l(f.predicate(), f.end(), f.end()); // !!
std::string filtered(base64_dec{f}, base64_dec{l});
std::cout << "OUT:" << std::quoted(filtered) << std::endl;
}
打印自身(减去示例 base64)
OUT:"hello world
"
Summary/TL;DR
考虑潜伏的错误和未记录的限制,考虑使用
正确、更简单的 base64 实现。
Beast 在其实现细节中有一个,因此也不支持,但是
它至少不那么脆弱的可能性很高。
或者,必须是具有适当测试和文档的库。
当给定 base64 'text' 解码时,包含新行,以下将引发异常 - 存在非 base64 字符。引用新行。
在抛出 'boost::archive::iterators::dataflow_exception' 的实例后调用终止 what(): 尝试解码不在 base64 字符集中的值
有谁知道如何让 boost 优雅地处理换行符?我知道我可以在解码之前自己将它们从字符串中删除,但我希望并猜测有一种更简化的方法。
typedef transform_width< binary_from_base64<remove_whitespace<char*>>, 8, 6 > base64_dec;
unsigned int size = s.size(); //where 's' is the string holding the base64 characters to include newlines at every 76th character
std::string decoded_token(base64_dec(s.c_str()), base64_dec(s.c_str() + size));
好的,问题在于换行符不是问题。 filter_iterator
从根本上被打破了。
一旦输入序列以不满足过滤谓词的字符(在本例中为空白字符)结束,就会导致Undefined Behaviour:
Crashing Live On Compiler Explorer
#include <boost/archive/iterators/remove_whitespace.hpp>
#include <iomanip>
#include <iostream>
namespace bai = boost::archive::iterators;
int main() {
using It = bai::remove_whitespace<const char*>;
std::string const s = "oops "; // ends in whitespace, causes UB
std::string filtered(It(s.c_str()), It(s.c_str() + s.length()));
std::cout << std::quoted(filtered) << std::flush;
}
打印,启用 ASan(没有它只是段错误):
=================================================================
==1==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffecf132b0 at pc 0x00000040390d bp 0x7fffecf12fd0 sp 0x7fffecf12fc8
READ of size 1 at 0x7fffecf132b0 thread T0
#0 0x40390c in dereference_impl /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:105
#1 0x40390c in dereference /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:113
#2 0x40390c in dereference<boost::archive::iterators::filter_iterator<(anonymous namespace)::remove_whitespace_predicate<char>, char const*> > /opt/compiler-explorer/libs/boost_1_78_0/boost/iterator/iterator_facade.hpp:550
#3 0x40390c in operator* /opt/compiler-explorer/libs/boost_1_78_0/boost/iterator/iterator_facade.hpp:656
#4 0x40390c in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<boost::archive::iterators::remove_whitespace<char const*> >(boost::archive::iterators::remove_whitespace<char const*>, boost::archive::iterators::remove_whitespace<char const*>, std::input_iterator_tag) /opt/compiler-explorer/gcc-trunk-20220419/include/c++/12.0.1/bits/basic_string.tcc:204
#5 0x40390c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<boost::archive::iterators::remove_whitespace<char const*>, void>(boost::archive::iterators::remove_whitespace<char const*>, boost::archive::iterators::remove_whitespace<char const*>, std::allocator<char> const&) /opt/compiler-explorer/gcc-trunk-20220419/include/c++/12.0.1/bits/basic_string.h:756
#6 0x40390c in main /app/example.cpp:11
#7 0x7fb625b560b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
#8 0x4041ed in _start (/app/output.s+0x4041ed)
Address 0x7fffecf132b0 is located in stack of thread T0 at offset 576 in frame
#0 0x40245f in main /app/example.cpp:7
This frame has 21 object(s):
[32, 33) '<unknown>'
[48, 49) '<unknown>'
[64, 65) '<unknown>'
[80, 81) '<unknown>'
[96, 97) '<unknown>'
[112, 113) '<unknown>'
[128, 136) 'start'
[160, 168) 'start'
[192, 200) '__guard'
[224, 232) 'start'
[256, 264) 'start'
[288, 296) '__capacity'
[320, 328) '__guard'
[352, 368) '<unknown>'
[384, 400) '<unknown>'
[416, 432) '<unknown>'
[448, 464) '<unknown>'
[480, 496) '<unknown>'
[512, 528) '<unknown>'
[544, 576) 's' (line 9) <== Memory access at offset 576 overflows this variable
[608, 640) 'filtered' (line 11)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:105 in dereference_impl
Shadow bytes around the buggy address:
0x10007d9da600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
0x10007d9da610: f1 f1 f8 f2 01 f2 f8 f2 f8 f2 f8 f2 01 f2 00 f2
0x10007d9da620: f2 f2 00 f2 f2 f2 f8 f2 f2 f2 00 f2 f2 f2 00 f2
0x10007d9da630: f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 00 f2 f2 00 00
0x10007d9da640: f2 f2 00 00 f2 f2 00 00 f2 f2 00 00 f2 f2 00 00
=>0x10007d9da650: f2 f2 00 00 00 00[f2]f2 f2 f2 00 00 00 00 f3 f3
0x10007d9da660: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1==ABORTING
您可以认为自己很幸运,因为症状而注意到您,而不是它吃掉您的小狗或在生产代码中发射核武器。
这应该被报告。没有单独测试 filter_iterator
(甚至 remove_whitespace
),之前的票似乎表明了一种立场,如“它对我有用”(其中“我”是 Boost 序列化库).参见例如https://github.com/boostorg/serialization/issues/135.
Interestingly the analysis at that ticket makes zero sense, since there hasn't been a two-iterator constructor for
filter_iterator
since ... forever. I can only guess that Robert mistakenly looked at Boost Iterator instead of thefilter_iterator
from Boost Archive.
所以我的直觉是建议您使用 Boost Iterator 中的 filter_iterator.hpp
反而。具有讽刺意味的是,它尝试了几次(并去了 cppslack/github)
没错,但这里是:Live On Compiler Explorer
使用 Boost 迭代器修复 filter_iterator
我们应该能够使用有效的 filter_iterator
实现来修复:
using FiltIt =
boost::iterators::filter_iterator<IsGraph, std::string::const_iterator>;
using base64_dec = //
bai::transform_width< //
bai::binary_from_base64<FiltIt>, 8, 6>; //
现在,要做到这一点仍然 棘手 。值得注意的是,天真的方法将再次因 UB 而失败:
// CAUTION: this invokes UB:
std::string filtered(base64_dec(s.begin()), base64_dec(s.end()));
这是隐式转换 + 默认参数的诅咒。相反,我们必须 单独显式构造 FiltIt:
FiltIt f(IsGraph{}, s.begin(), s.end()), // !!
l(f.predicate(), f.end(), f.end()); // !!
现在我们可以“只”在 base64_dec 中使用它们:
std::string filtered(base64_dec{f}, base64_dec{l});
Note the uniform {} initializers to sidestep Most Vexing Parse
#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/transform_width.hpp>
#include <boost/iterator/filter_iterator.hpp>
#include <iomanip>
#include <iostream>
namespace bai = boost::archive::iterators;
static std::string const s = "aGVsbG8g\nd29ybGQK";
int main() {
std::cout << std::unitbuf;
struct IsGraph {
// unsigned char prevents sign extension
bool operator()(unsigned char ch) const {
return std::isgraph(ch); // !std::isspace
}
};
using FiltIt =
boost::iterators::filter_iterator<IsGraph, std::string::const_iterator>;
using base64_dec = //
bai::transform_width< //
bai::binary_from_base64<FiltIt>, 8, 6>; //
//// CAUTION: this invokes UB:
//std::string filtered(base64_dec(s.begin()), base64_dec(s.end()));
FiltIt f(IsGraph{}, s.begin(), s.end()), // !!
l(f.predicate(), f.end(), f.end()); // !!
std::string filtered(base64_dec{f}, base64_dec{l});
std::cout << "OUT:" << std::quoted(filtered) << std::endl;
}
打印自身(减去示例 base64)
OUT:"hello world
"
Summary/TL;DR
考虑潜伏的错误和未记录的限制,考虑使用 正确、更简单的 base64 实现。
Beast 在其实现细节中有一个,因此也不支持,但是 它至少不那么脆弱的可能性很高。
或者,必须是具有适当测试和文档的库。