c ++ boost base64解码器在存在换行符时失败

c++ boost base64 decoder fails when newlines are present

当给定 base64 'text' 解码时,包含新行,以下将引发异常 - 存在非 base64 字符。引用新行。

在抛出 'boost::archive::iterators::dataflow_exception' 的实例后调用终止 what(): 尝试解码不在 base64 字符集中的值

有谁知道如何让 boost 优雅地处理换行符?我知道我可以在解码之前自己将它们从字符串中删除,但我希望并猜测有一种更简化的方法。

typedef transform_width< binary_from_base64<remove_whitespace<char*>>, 8, 6 > base64_dec;
unsigned int size = s.size(); //where 's' is the string holding the base64 characters to include newlines at every 76th character
std::string decoded_token(base64_dec(s.c_str()), base64_dec(s.c_str() + size));

好的,问题在于换行符不是问题。 filter_iterator 从根本上被打破了。

一旦输入序列以不满足过滤谓词的字符(在本例中为空白字符)结束,就会导致Undefined Behaviour

Crashing Live On Compiler Explorer

#include <boost/archive/iterators/remove_whitespace.hpp>
#include <iomanip>
#include <iostream>

namespace bai = boost::archive::iterators;

int main() {
    using It = bai::remove_whitespace<const char*>;
    std::string const s = "oops "; // ends in whitespace, causes UB

    std::string filtered(It(s.c_str()), It(s.c_str() + s.length()));
    std::cout << std::quoted(filtered) << std::flush;
}

打印,启用 ASan(没有它只是段错误):

=================================================================
==1==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffecf132b0 at pc 0x00000040390d bp 0x7fffecf12fd0 sp 0x7fffecf12fc8
READ of size 1 at 0x7fffecf132b0 thread T0
    #0 0x40390c in dereference_impl /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:105
    #1 0x40390c in dereference /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:113
    #2 0x40390c in dereference<boost::archive::iterators::filter_iterator<(anonymous namespace)::remove_whitespace_predicate<char>, char const*> > /opt/compiler-explorer/libs/boost_1_78_0/boost/iterator/iterator_facade.hpp:550
    #3 0x40390c in operator* /opt/compiler-explorer/libs/boost_1_78_0/boost/iterator/iterator_facade.hpp:656
    #4 0x40390c in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<boost::archive::iterators::remove_whitespace<char const*> >(boost::archive::iterators::remove_whitespace<char const*>, boost::archive::iterators::remove_whitespace<char const*>, std::input_iterator_tag) /opt/compiler-explorer/gcc-trunk-20220419/include/c++/12.0.1/bits/basic_string.tcc:204
    #5 0x40390c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<boost::archive::iterators::remove_whitespace<char const*>, void>(boost::archive::iterators::remove_whitespace<char const*>, boost::archive::iterators::remove_whitespace<char const*>, std::allocator<char> const&) /opt/compiler-explorer/gcc-trunk-20220419/include/c++/12.0.1/bits/basic_string.h:756
    #6 0x40390c in main /app/example.cpp:11
    #7 0x7fb625b560b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    #8 0x4041ed in _start (/app/output.s+0x4041ed)

Address 0x7fffecf132b0 is located in stack of thread T0 at offset 576 in frame
    #0 0x40245f in main /app/example.cpp:7

  This frame has 21 object(s):
    [32, 33) '<unknown>'
    [48, 49) '<unknown>'
    [64, 65) '<unknown>'
    [80, 81) '<unknown>'
    [96, 97) '<unknown>'
    [112, 113) '<unknown>'
    [128, 136) 'start'
    [160, 168) 'start'
    [192, 200) '__guard'
    [224, 232) 'start'
    [256, 264) 'start'
    [288, 296) '__capacity'
    [320, 328) '__guard'
    [352, 368) '<unknown>'
    [384, 400) '<unknown>'
    [416, 432) '<unknown>'
    [448, 464) '<unknown>'
    [480, 496) '<unknown>'
    [512, 528) '<unknown>'
    [544, 576) 's' (line 9) <== Memory access at offset 576 overflows this variable
    [608, 640) 'filtered' (line 11)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:105 in dereference_impl
Shadow bytes around the buggy address:
  0x10007d9da600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  0x10007d9da610: f1 f1 f8 f2 01 f2 f8 f2 f8 f2 f8 f2 01 f2 00 f2
  0x10007d9da620: f2 f2 00 f2 f2 f2 f8 f2 f2 f2 00 f2 f2 f2 00 f2
  0x10007d9da630: f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 00 f2 f2 00 00
  0x10007d9da640: f2 f2 00 00 f2 f2 00 00 f2 f2 00 00 f2 f2 00 00
=>0x10007d9da650: f2 f2 00 00 00 00[f2]f2 f2 f2 00 00 00 00 f3 f3
  0x10007d9da660: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007d9da670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007d9da680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007d9da690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007d9da6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1==ABORTING

您可以认为自己很幸运,因为症状而注意到您,而不是它吃掉您的小狗或在生产代码中发射核武器。

这应该被报告。没有单独测试 filter_iterator(甚至 remove_whitespace),之前的票似乎表明了一种立场,如“它对我有用”(其中“我”是 Boost 序列化库).参见例如https://github.com/boostorg/serialization/issues/135.

Interestingly the analysis at that ticket makes zero sense, since there hasn't been a two-iterator constructor for filter_iterator since ... forever. I can only guess that Robert mistakenly looked at Boost Iterator instead of the filter_iterator from Boost Archive.

所以我的直觉是建议您使用 Boost Iterator 中的 filter_iterator.hpp 反而。具有讽刺意味的是,它尝试了几次(并去了 cppslack/github) 没错,但这里是:Live On Compiler Explorer

使用 Boost 迭代器修复 filter_iterator

我们应该能够使用有效的 filter_iterator 实现来修复:

using FiltIt =
    boost::iterators::filter_iterator<IsGraph, std::string::const_iterator>;

using base64_dec =                              //
    bai::transform_width<                       //
        bai::binary_from_base64<FiltIt>, 8, 6>; //

现在,要做到这一点仍然 棘手 。值得注意的是,天真的方法将再次因 UB 而失败:

// CAUTION: this invokes UB:
std::string filtered(base64_dec(s.begin()), base64_dec(s.end()));

这是隐式转换 + 默认参数的诅咒。相反,我们必须 单独显式构造 FiltIt:

FiltIt f(IsGraph{}, s.begin(), s.end()), // !!
    l(f.predicate(), f.end(), f.end());  // !!

现在我们可以“只”在 base64_dec 中使用它们:

std::string filtered(base64_dec{f}, base64_dec{l});

Note the uniform {} initializers to sidestep Most Vexing Parse

Live On Compiler Explorer

#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/transform_width.hpp>
#include <boost/iterator/filter_iterator.hpp>
#include <iomanip>
#include <iostream>

namespace bai = boost::archive::iterators;
static std::string const s = "aGVsbG8g\nd29ybGQK";

int main() {
    std::cout << std::unitbuf;

    struct IsGraph {
        // unsigned char prevents sign extension
        bool operator()(unsigned char ch) const {
            return std::isgraph(ch); // !std::isspace
        }
    };

    using FiltIt =
        boost::iterators::filter_iterator<IsGraph, std::string::const_iterator>;

    using base64_dec =                              //
        bai::transform_width<                       //
            bai::binary_from_base64<FiltIt>, 8, 6>; //

    //// CAUTION: this invokes UB:
    //std::string filtered(base64_dec(s.begin()), base64_dec(s.end()));
    FiltIt f(IsGraph{}, s.begin(), s.end()),   // !!
        l(f.predicate(), f.end(), f.end()); // !!

    std::string filtered(base64_dec{f}, base64_dec{l});
    std::cout << "OUT:" << std::quoted(filtered) << std::endl;
}

打印自身(减去示例 base64)

OUT:"hello world
"

Summary/TL;DR

考虑潜伏的错误和未记录的限制,考虑使用 正确、更简单的 base64 实现。

Beast 在其实现细节中有一个,因此也不支持,但是 它至少不那么脆弱的可能性很高。

或者,必须是具有适当测试和文档的库。