读取小端 16 位无符号整数

read little endian 16 bit unsigned integer

我正在研究您自己解析 terminfo database files, which are a type of binary files. You can read about its storage format 并确认我面临的问题。

手册说-

The header section begins the file. This section contains six short integers in the format described below. These integers are

(1) the magic number (octal 0432);

...

...

Short integers are stored in two 8-bit bytes. The first byte contains the least significant 8 bits of the value, and the second byte contains the most significant 8 bits. (Thus, the value represented is 256*second+first.) The value -1 is represented by the two bytes 0377, 0377; other negative values are illegal. This value generally means that the corresponding capability is missing from this terminal. Machines where this does not correspond to the hardware must read the integers as two bytes and compute the little-endian value.


我已经尝试使用 char 数据类型读取它,但它肯定会在我的机器上产生垃圾。可以通过 infocmp 命令读取正确的输入,例如 - $ infocmp xterm.


#include <fstream>
#include <iostream>
#include <vector>

int main()
{
    std::ifstream db(
      "/usr/share/terminfo/g/gnome", std::ios::binary | std::ios::ate);

    std::vector<unsigned char> buffer;

    if (db) {
        auto size = db.tellg();
        buffer.resize(size);
        db.seekg(0, std::ios::beg);
        db.read(reinterpret_cast<char*>(&buffer.front()), size);
    }
    std::cout << "\n";
}

 = std::vector of length 3069, capacity 3069 = {26 '2', 1 '[=11=]1', 21 '5',
  0 '[=11=]0', 38 '&', 0 '[=11=]0', 16 '0', 0 '[=11=]0', 157 '5', 1 '[=11=]1',
  193 '1', 4 '[=11=]4', 103 'g', 110 'n', 111 'o', 109 'm', 101 'e', 124 '|',
  71 'G', 78 'N', 79 'O', 77 'M', 69 'E', 32 ' ', 84 'T', 101 'e', 114 'r',
  109 'm', 105 'i', 110 'n', 97 'a', 108 'l', 0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1',
  0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0',
  0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1', 1 '[=11=]1', 0 '[=11=]0',
....
....

The first problem while parsing this type of input is that it fixes the size to 8 bits, so the plain old char cannot be used since it doesn't guarantees the size to be exactly 8 bits.

任何至少为 8 位的整数都可以。虽然 char 不能保证正好是 8 位,但至少需要 8 位,所以就大小而言,除了在某些情况下可能需要屏蔽高位之外没有问题位(如果存在)。但是,char 可能不是无符号的,并且您不希望将八位字节解释为有符号值,因此请改用 unsigned char

The second problem is there is no buffer.readInt16LE() method in c++ standard library which might read 16 bytes of data in Little Endian format. So how should I proceed forward to implement this function again in a portable & safe way.

一次将一个八位字节读入 unsigned char。将第一个八位字节分配给变量(它足够大以表示至少 16 位)。将第二个八位位组的位左移 8,并使用复合按位或分配给变量。

或者更好的是,不要重新实现它,而是使用第三方现有库。

I've already tried reading it with char data type but it definitely produces garbage on my machine.

那么你的尝试是错误的。 char 本身没有导致垃圾输出的问题。我建议使用调试器来解决这个问题。