读取小端 16 位无符号整数
read little endian 16 bit unsigned integer
我正在研究您自己解析 terminfo database files, which are a type of binary files. You can read about its storage format 并确认我面临的问题。
手册说-
The header section begins the file. This section contains
six short integers in the format described below. These
integers are
(1) the magic number (octal 0432);
...
...
Short integers are stored in two 8-bit bytes. The first
byte contains the least significant 8 bits of the value,
and the second byte contains the most significant 8 bits.
(Thus, the value represented is 256*second+first.) The
value -1 is represented by the two bytes 0377, 0377; other
negative values are illegal. This value generally means
that the corresponding capability is missing from this
terminal. Machines where this does not correspond to the
hardware must read the integers as two bytes and compute
the little-endian value.
解析此类输入时的第一个问题是它将大小固定为 8 位,因此不能使用普通的旧 char,因为它不能保证大小正好是 8 位.所以我在寻找“Fixed width integer types”,但再次面临选择 b/w int8_t
或 uint8_t
的困境,这清楚地表明 - "provided only if the implementation directly supports the type"。那我应该选择什么类型才够便携
第二个问题是c++标准库中没有buffer.readInt16LE()
方法可以读取Little Endian格式的16字节数据。那么我应该如何以可移植和安全的方式再次实现此功能。
我已经尝试使用 char
数据类型读取它,但它肯定会在我的机器上产生垃圾。可以通过 infocmp
命令读取正确的输入,例如 - $ infocmp xterm
.
#include <fstream>
#include <iostream>
#include <vector>
int main()
{
std::ifstream db(
"/usr/share/terminfo/g/gnome", std::ios::binary | std::ios::ate);
std::vector<unsigned char> buffer;
if (db) {
auto size = db.tellg();
buffer.resize(size);
db.seekg(0, std::ios::beg);
db.read(reinterpret_cast<char*>(&buffer.front()), size);
}
std::cout << "\n";
}
= std::vector of length 3069, capacity 3069 = {26 '2', 1 '[=11=]1', 21 '5',
0 '[=11=]0', 38 '&', 0 '[=11=]0', 16 '0', 0 '[=11=]0', 157 '5', 1 '[=11=]1',
193 '1', 4 '[=11=]4', 103 'g', 110 'n', 111 'o', 109 'm', 101 'e', 124 '|',
71 'G', 78 'N', 79 'O', 77 'M', 69 'E', 32 ' ', 84 'T', 101 'e', 114 'r',
109 'm', 105 'i', 110 'n', 97 'a', 108 'l', 0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1',
0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0',
0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1', 1 '[=11=]1', 0 '[=11=]0',
....
....
The first problem while parsing this type of input is that it fixes the size to 8 bits, so the plain old char cannot be used since it doesn't guarantees the size to be exactly 8 bits.
任何至少为 8 位的整数都可以。虽然 char
不能保证正好是 8 位,但至少需要 8 位,所以就大小而言,除了在某些情况下可能需要屏蔽高位之外没有问题位(如果存在)。但是,char
可能不是无符号的,并且您不希望将八位字节解释为有符号值,因此请改用 unsigned char
。
The second problem is there is no buffer.readInt16LE() method in c++ standard library which might read 16 bytes of data in Little Endian format. So how should I proceed forward to implement this function again in a portable & safe way.
一次将一个八位字节读入 unsigned char
。将第一个八位字节分配给变量(它足够大以表示至少 16 位)。将第二个八位位组的位左移 8,并使用复合按位或分配给变量。
或者更好的是,不要重新实现它,而是使用第三方现有库。
I've already tried reading it with char data type but it definitely produces garbage on my machine.
那么你的尝试是错误的。 char
本身没有导致垃圾输出的问题。我建议使用调试器来解决这个问题。
我正在研究您自己解析 terminfo database files, which are a type of binary files. You can read about its storage format 并确认我面临的问题。
手册说-
The header section begins the file. This section contains six short integers in the format described below. These integers are
(1) the magic number (octal 0432);
...
...
Short integers are stored in two 8-bit bytes. The first byte contains the least significant 8 bits of the value, and the second byte contains the most significant 8 bits. (Thus, the value represented is 256*second+first.) The value -1 is represented by the two bytes 0377, 0377; other negative values are illegal. This value generally means that the corresponding capability is missing from this terminal. Machines where this does not correspond to the hardware must read the integers as two bytes and compute the little-endian value.
解析此类输入时的第一个问题是它将大小固定为 8 位,因此不能使用普通的旧 char,因为它不能保证大小正好是 8 位.所以我在寻找“Fixed width integer types”,但再次面临选择 b/w
int8_t
或uint8_t
的困境,这清楚地表明 - "provided only if the implementation directly supports the type"。那我应该选择什么类型才够便携第二个问题是c++标准库中没有
buffer.readInt16LE()
方法可以读取Little Endian格式的16字节数据。那么我应该如何以可移植和安全的方式再次实现此功能。
我已经尝试使用 char
数据类型读取它,但它肯定会在我的机器上产生垃圾。可以通过 infocmp
命令读取正确的输入,例如 - $ infocmp xterm
.
#include <fstream>
#include <iostream>
#include <vector>
int main()
{
std::ifstream db(
"/usr/share/terminfo/g/gnome", std::ios::binary | std::ios::ate);
std::vector<unsigned char> buffer;
if (db) {
auto size = db.tellg();
buffer.resize(size);
db.seekg(0, std::ios::beg);
db.read(reinterpret_cast<char*>(&buffer.front()), size);
}
std::cout << "\n";
}
= std::vector of length 3069, capacity 3069 = {26 '2', 1 '[=11=]1', 21 '5',
0 '[=11=]0', 38 '&', 0 '[=11=]0', 16 '0', 0 '[=11=]0', 157 '5', 1 '[=11=]1',
193 '1', 4 '[=11=]4', 103 'g', 110 'n', 111 'o', 109 'm', 101 'e', 124 '|',
71 'G', 78 'N', 79 'O', 77 'M', 69 'E', 32 ' ', 84 'T', 101 'e', 114 'r',
109 'm', 105 'i', 110 'n', 97 'a', 108 'l', 0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1',
0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0',
0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 0 '[=11=]0', 1 '[=11=]1', 1 '[=11=]1', 0 '[=11=]0',
....
....
The first problem while parsing this type of input is that it fixes the size to 8 bits, so the plain old char cannot be used since it doesn't guarantees the size to be exactly 8 bits.
任何至少为 8 位的整数都可以。虽然 char
不能保证正好是 8 位,但至少需要 8 位,所以就大小而言,除了在某些情况下可能需要屏蔽高位之外没有问题位(如果存在)。但是,char
可能不是无符号的,并且您不希望将八位字节解释为有符号值,因此请改用 unsigned char
。
The second problem is there is no buffer.readInt16LE() method in c++ standard library which might read 16 bytes of data in Little Endian format. So how should I proceed forward to implement this function again in a portable & safe way.
一次将一个八位字节读入 unsigned char
。将第一个八位字节分配给变量(它足够大以表示至少 16 位)。将第二个八位位组的位左移 8,并使用复合按位或分配给变量。
或者更好的是,不要重新实现它,而是使用第三方现有库。
I've already tried reading it with char data type but it definitely produces garbage on my machine.
那么你的尝试是错误的。 char
本身没有导致垃圾输出的问题。我建议使用调试器来解决这个问题。