算术编码 FPAQ0（一个简单的 order-0 算术文件压缩器）

Question

我试图理解 fpaq0 算术压缩器的代码，但我无法完全理解 it.Here 是代码的 link -fpaq0.cpp

我无法准确理解 ct[512]['2] 和 cxt 是如何 working.Also 我不是很清楚解码器是如何 working.Why 在编码每个字符之前 e.encode(0) 正在调用。

注意；我已经理解 link-Data Compression with Arithmetic Encoding

中介绍的算术编码器

  void update(int y) {
if (++ct[cxt][y] > 65534) {
  ct[cxt][0] >>= 1;
  ct[cxt][1] >>= 1;
}
if ((cxt+=cxt+y) >= 512)
  cxt=1;
}
   // Assume a stationary order 0 stream of 9-bit symbols
int p() const {
 return 4096*(ct[cxt][1]+1)/(ct[cxt][0]+ct[cxt][1]+2);
}
inline void Encoder::encode(int y) {

// Update the range
const U32 xmid = x1 + ((x2-x1) >> 12) * predictor.p();
assert(xmid >= x1 && xmid < x2);
if (y)
 x2=xmid;
else
x1=xmid+1;
predictor.update(y);

// Shift equal MSB's out
while (((x1^x2)&0xff000000)==0) {
putc(x2>>24, archive);
x1<<=8;
x2=(x2<<8)+255;
}
}

inline int Encoder::decode() {

// Update the range
const U32 xmid = x1 + ((x2-x1) >> 12) * predictor.p();
assert(xmid >= x1 && xmid < x2);
int y=0;
if (x<=xmid) {
 y=1;
 x2=xmid;
}
else
x1=xmid+1;
predictor.update(y);

// Shift equal MSB's out
while (((x1^x2)&0xff000000)==0) {
x1<<=8;
x2=(x2<<8)+255;
int c=getc(archive);
if (c==EOF) c=0;
x=(x<<8)+c;
}
return y;
}

Answer 1

fpaq0 是一个文件压缩器，它使用 order-0 位模型进行建模，并使用 12 位无进位算术编码器进行熵编码阶段。 ct[512][2] 存储每个上下文的计数器以计算符号概率。上下文（fpaq0 中的 order-0）是用带前导的部分位计算的（以简化计算）。

为了更容易解释，我们暂时跳过 EOF 符号。 Order-0 上下文计算如下，没有 EOF 符号（简化）：

// Full byte encoding
int cxt = 1; // context starts with leading one
for (int i = 0; i < 8; ++i) {
  // Encoding part
  int y = ReadNextBit();
  int p = GetProbability(ctx);
  EncodeBit(y, p);

  // Model updating
  UpdateCounter(cxt, y); // Update related counter
  cxt = (cxt << 1) | y; // shift left and insert new bit
}

为了解码，使用没有 EOF 符号的上下文，如下所示（简化）：

// Full byte decoding
int cxt = 1; // context starts with leading one
for (int i = 0; i < 8; ++i) {
  // Decoding part
  int p = GetProbability(ctx);
  int y = DecodeBit(p);
  WriteBit(y);

  // Model updating
  UpdateCounter(cxt, y); // Update related counter
  cxt = (cxt << 1) | y; // shift left and insert new bit
}

fpaq0 被设计为流压缩器。这意味着它不需要知道输入流的确切长度。那么，解码器应该如何知道何时停止的问题呢？ EOF 符号正是为此而使用的。在对每个字节进行编码时，零位被编码为标志以指示后面还有更多数据。一个表示我们到达了流的末尾。所以，解码器知道什么时候停止。这就是为什么我们的上下文模型是 9 位（EOF 标志 + 8 位数据）的原因。

现在，最后一部分：概率计算。 fpaq0 仅使用 order-0 上下文中过去符号的计数来计算最终概率。

n0 = count of 0
n1 = count of 1
p = n1 / (n0 + n1)

有两个实施细节需要解决：计数器溢出和除以零。

通过在达到阈值时将两个计数减半来解决计数器溢出问题。因为，我们正在处理 p，这是有道理的。

通过在每个变量的公式中插入一个来解决被零除的问题。所以，

p = (n1 + 1) / ((n0 + 1) + (n1 + 1))

算术编码 FPAQ0（一个简单的 order-0 算术文件压缩器）

Arithmetic Coding FPAQ0 (a simple order-0 arithmetic file compressor )

c++

compression

data-compression