将令牌转换为 char* const* 时，使用 boost 对字符串进行令牌化失败

Question

我正在使用 boost::tokenizer 在 C++ 中标记字符串，然后我想将它传递给 execv。

考虑以下代码片段（可编译）：

#include <iostream>
#include <cstdlib>
#include <vector>
#include <boost/tokenizer.hpp>

// I will put every token into this vector
std::vector<const char*> argc;
// this is the command I want to parse
std::string command = "/bin/ls -la -R";


void test_tokenizer() {
  // tokenizer is needed because arguments can be in quotes
  boost::tokenizer<boost::escaped_list_separator<char> > scriptArguments(
              command,
              boost::escaped_list_separator<char>("\", " ", "\""));
  boost::tokenizer<boost::escaped_list_separator<char> >::iterator argument;
  for(argument = scriptArguments.begin(); 
    argument!=scriptArguments.end(); 
    ++argument) {

    argc.push_back(argument->c_str());
    std::cout << argument->c_str() << std::endl;
  }

  argc.push_back(NULL);
}

void test_raw() {
  argc.push_back("/bin/ls");
  argc.push_back("-l");
  argc.push_back("-R");

  argc.push_back(NULL);
}

int main() {
  // this works OK
  /*test_raw();
  execv(argc[0], (char* const*)&argc[0]);
  std::cerr << "execv failed";
  _exit(1);
  */

  // this is not working
  test_tokenizer();
  execv(argc[0], (char* const*)&argc[0]);
  std::cerr << "execv failed";
  _exit(2);
}

当我运行这个脚本调用 test_tokenizer() 时，它会打印 'execv failed'。（尽管它很好地打印了参数）。

但是，如果我将 test_tokenizer 更改为 test_raw，它运行没问题。

这一定是一些简单的解决方案，但我没有找到。

PS.: 我也把它放入一个支持 boost 的在线编译器 here.

Answer 1

boost::tokenizer 在令牌迭代器中按值（默认情况下为 std::string）保存令牌。

因此argument->c_str()指向的字符数组可能会随着迭代器的修改而被修改或失效，其生命周期最迟到argument结束。

因此当您尝试使用 argc.

时您的程序有未定义的行为

如果您想继续使用 boost::tokenizer，我建议将标记保留在 std::vector<std::string> 中，然后将它们转换为指针数组。

将令牌转换为 char* const* 时，使用 boost 对字符串进行令牌化失败

tokenizing string with boost fails when casting tokens to char* const*

c++

boost

boost-tokenizer