如何在成功读取文件中的 ascii header 后从文件中读取二进制数据

Question

我正在尝试阅读 Netpbm 图像格式，遵循规范解释 here。格式的 ascii 类型（有 P1、P2 和 P3 作为幻数），我可以毫无问题地阅读。但是我在读取这些文件（其中 P4、P5 和 P6 作为幻数）中的二进制数据时遇到问题 - 文件的 header（它是 ascii）我可以毫无问题地读取。

在link中表示：

In the binary formats, PBM uses 1 bit per pixel, PGM uses 8 or 16 bits per pixel, and PPM uses 24 bits per pixel: 8 for red, 8 for green, 8 for blue. Some readers and writers may support 48 bits per pixel (16 each for R,G,B), but this is still rare.

有了这个，我尝试使用这个 to read the data, bit by bit, and got this code:

if(*this->magicNumber == "P4") {
  this->pixels = new Matrix<int>(this->width, this->height);

  vector<int> p;
  while(getline(file, line_pixels)) {
    if(line_pixels.size() > 0 && line_pixels.at(0) != '#') {
      string byte;
      stringstream ss(line_pixels);
      while(getline(ss, byte)) {
        unsigned char c = (unsigned char)byte.at(0);
        for(int x=0; x != 8; x++) p.push_back( (c & (1 << x)) != 0 );
      }
    }
  }

  int count = 0;
  for(int i=0; i<height; i++) {
    for(int j=0; j<width; j++) {
      this->pixels->set(i, j, p[count++]);
    }
  }
}

但是当我尝试在这个link中读取名为sample_640×426.pbm的图像时，我应该得到这个结果：

但我得到的是这个结果：

对于 PGM 和 PPM 图像的二进制格式，当我尝试打开图像时，当我在循环执行的某个时刻尝试递增 count 时出现分段错误。我认为 vector<int> p 的大小最终大于预期的产品 width x height.

PGM 格式的code：

if(*this->magicNumber == "P5") {
  this->pixels = new Matrix<int>(this->width, this->height);

  vector<int> p;
  while(getline(file, line_pixels)) {
    if(line_pixels.size() > 0 && line_pixels.at(0) != '#') {
      string number;
      stringstream ss(line_pixels);
      while(getline(ss, number)) {
        unsigned char data = (unsigned char)number.at(0);
        p.push_back((int)data);
      }
    }
  }

  int count = 0;
  for(int i=0; i<height; i++) {
    for(int j=0; j<width; j++) {
      this->pixels->set(i, j, p[count++]);
    }
  }
}

PPM 格式的code：

if(*this->magicNumber == "P6") {
  this->pixels = new Matrix<struct Pixel>(this->width, this->height);

  vector<int> p;
  while(getline(file, line_pixels)) {
    if(line_pixels.size() > 0 && line_pixels.at(0) != '#') {
      string byte;
      stringstream ss(line_pixels);
      while(getline(ss, byte)) {
        unsigned char data = (unsigned char)byte.at(0);
        p.push_back((int)data);
      }
    }
  }

  int count = 0;
  for(int i=0; i<height; i++) {
    for(int j=0; j<width; j++) {
      struct Pixel pixel;
      pixel.r = p[count++];
      pixel.g = p[count++];
      pixel.b = p[count++];
      this->pixels->set(i, j, pixel);
    }
  }
}

任何人都可以提示我在这里做错了什么？

Answer 1

while(getline(file, line_pixels)) {

std::getline 从输入流中读取，直到读取到一个换行符。

文件就是文件。它包含字节。您是否认为该文件包含文本或二进制文件纯粹是一个解释问题。

文本行以换行符结束。这就是 std::getline 所做的：它从文件中读取字节，直到读取换行符。无论读取什么，都会进入 std::string 参数。

如果您的意图是读取一些二进制数据（例如图像），这将非常令人困惑。包含与换行符相同值的字节可以自然地出现在二进制文件（如图像文件）中，代表适当的像素值。用std::getline读non-textual资料总是泪流满面

这只在一种情况下有意义：如果您事先已经知道，您打算在此处读取的二进制数据以一个恰好是换行符的字节结尾，并且该换行符无处出现否则。

但是，当然，在图像文件中，您没有任何此类保证。

读取图像数据时，您通常需要从文件中读取特定数量的字节。

在这里，您可以预先知道图像的大小及其格式。在此基础上，您可以使用简单的数学公式计算出您希望读取多少字节。

而这恰好是std::istream's read() method does:从文件中读取的特定字节数。 link 提供了更多信息。

您需要将显示的所有错误使用 getline 的代码替换为使用 read().

的代码

如何在成功读取文件中的 ascii header 后从文件中读取二进制数据

How to read binary data from file after read sucessfully the ascii header in the file

c++

binary-data

netpbm