在量化之前将值反量化为原始值
Dequantize values to their original prior to quantization
论文“使用小型前馈网络进行自然语言处理”https://arxiv.org/pdf/1708.00214.pdf 指出:
我已经根据 python 中的上述等式实现了量化:
b = 128
embedding_matrix = [[20000,3000,1000],[1999999,20000,1999999], [20000,3000,1000]]
scaled = [ abs(round( (1 / (b - 1) * max(e)) , 3)) for e in embedding_matrix]
print(scaled)
i = 0
quantized = []
for e in embedding_matrix :
for v in e :
quantized.append((v , math.floor(.5 + ( (v / scaled[i]) + b) )))
i = i + 1
quantized
运行 此代码 quantized
设置为:
[(20000, 255),
(3000, 147),
(1000, 134),
(1999999, 255),
(20000, 129),
(1999999, 255),
(20000, 255),
(3000, 147),
(1000, 134)]
如何去量化回到量化前的原始值?
阅读https://www.tensorflow.org/api_docs/python/tf/quantization/dequantize描述:
tf.quantization.dequantize(
input, min_range, max_range, mode='MIN_COMBINED', name=None, axis=None,
narrow_range=False, dtype=tf.dtypes.float32
)
[min_range, max_range] are scalar floats that specify the range for the output. The 'mode' attribute controls exactly which calculations are used to convert the float values to their quantized equivalents.
和 PyTorch 文档:https://pytorch.org/docs/stable/quantization.html
量化的实现方式似乎与上述实现方式不同?
他们在论文中所做的大致是这样的:
import numpy as np
b = 128
embedding_matrix = np.array([[20000,3000,1000,1000],[1999999,20000,1999999,1999999], [20000,3000,1000,1000]])
scales = (np.abs(embedding_matrix).max(axis=1) / (b-1)).reshape(-1, 1)
quantized = (embedding_matrix / scales + b + 0.5).astype(np.uint8)
dequantized = (quantized - b) * scales
print(quantized)
print(dequantized)
输出:
[[255 147 134 134]
[255 129 255 255]
[255 147 134 134]]
[[2.00000000e+04 2.99212598e+03 9.44881890e+02 9.44881890e+02]
[1.99999900e+06 1.57480236e+04 1.99999900e+06 1.99999900e+06]
[2.00000000e+04 2.99212598e+03 9.44881890e+02 9.44881890e+02]]
简而言之,他们只有 q_ij = round(e_ij / s_i + b)
,所以在你只有量化值 q_ij
之后,你最好的近似值是 q_ij = dequantized_ij / s_i + b
,所以 dequantized_ij = (q_ij - b) * s_i
至于 pytorch - torch.quantize_per_channel
提供了类似的功能,例如,以下代码的作用几乎相同:
import torch
t = torch.tensor(embedding_matrix, dtype=torch.float32)
zero_point = torch.tensor([b]).repeat(t.shape[0], 1).reshape(-1)
quantized_tensor = torch.quantize_per_channel(t, t.abs().max(axis=1)[0] / (b-1), zero_point, 0, torch.quint8)
print(quantized_tensor)
print(quantized_tensor.int_repr())
输出:
tensor([[2.0000e+04, 2.9921e+03, 9.4488e+02, 9.4488e+02],
[2.0000e+06, 1.5748e+04, 2.0000e+06, 2.0000e+06],
[2.0000e+04, 2.9921e+03, 9.4488e+02, 9.4488e+02]], size=(3, 4),
dtype=torch.quint8, quantization_scheme=torch.per_channel_affine,
scale=tensor([ 157.4803, 15748.0234, 157.4803], dtype=torch.float64),
zero_point=tensor([128, 128, 128]), axis=0)
tensor([[255, 147, 134, 134],
[255, 129, 255, 255],
[255, 147, 134, 134]], dtype=torch.uint8)
如果在 pytorch 中像这样按通道量化,你只能在完整张量上应用 .dequantize()
而不是切片,这对嵌入来说不是一件好事,但你可以手动完成,非常容易使用repr_int
、q_per_channel_zero_points
和 q_per_channel_scales
。
这是否回答了您的问题?
论文“使用小型前馈网络进行自然语言处理”https://arxiv.org/pdf/1708.00214.pdf 指出:
我已经根据 python 中的上述等式实现了量化:
b = 128
embedding_matrix = [[20000,3000,1000],[1999999,20000,1999999], [20000,3000,1000]]
scaled = [ abs(round( (1 / (b - 1) * max(e)) , 3)) for e in embedding_matrix]
print(scaled)
i = 0
quantized = []
for e in embedding_matrix :
for v in e :
quantized.append((v , math.floor(.5 + ( (v / scaled[i]) + b) )))
i = i + 1
quantized
运行 此代码 quantized
设置为:
[(20000, 255),
(3000, 147),
(1000, 134),
(1999999, 255),
(20000, 129),
(1999999, 255),
(20000, 255),
(3000, 147),
(1000, 134)]
如何去量化回到量化前的原始值?
阅读https://www.tensorflow.org/api_docs/python/tf/quantization/dequantize描述:
tf.quantization.dequantize(
input, min_range, max_range, mode='MIN_COMBINED', name=None, axis=None,
narrow_range=False, dtype=tf.dtypes.float32
)
[min_range, max_range] are scalar floats that specify the range for the output. The 'mode' attribute controls exactly which calculations are used to convert the float values to their quantized equivalents.
和 PyTorch 文档:https://pytorch.org/docs/stable/quantization.html
量化的实现方式似乎与上述实现方式不同?
他们在论文中所做的大致是这样的:
import numpy as np
b = 128
embedding_matrix = np.array([[20000,3000,1000,1000],[1999999,20000,1999999,1999999], [20000,3000,1000,1000]])
scales = (np.abs(embedding_matrix).max(axis=1) / (b-1)).reshape(-1, 1)
quantized = (embedding_matrix / scales + b + 0.5).astype(np.uint8)
dequantized = (quantized - b) * scales
print(quantized)
print(dequantized)
输出:
[[255 147 134 134]
[255 129 255 255]
[255 147 134 134]]
[[2.00000000e+04 2.99212598e+03 9.44881890e+02 9.44881890e+02]
[1.99999900e+06 1.57480236e+04 1.99999900e+06 1.99999900e+06]
[2.00000000e+04 2.99212598e+03 9.44881890e+02 9.44881890e+02]]
简而言之,他们只有 q_ij = round(e_ij / s_i + b)
,所以在你只有量化值 q_ij
之后,你最好的近似值是 q_ij = dequantized_ij / s_i + b
,所以 dequantized_ij = (q_ij - b) * s_i
至于 pytorch - torch.quantize_per_channel
提供了类似的功能,例如,以下代码的作用几乎相同:
import torch
t = torch.tensor(embedding_matrix, dtype=torch.float32)
zero_point = torch.tensor([b]).repeat(t.shape[0], 1).reshape(-1)
quantized_tensor = torch.quantize_per_channel(t, t.abs().max(axis=1)[0] / (b-1), zero_point, 0, torch.quint8)
print(quantized_tensor)
print(quantized_tensor.int_repr())
输出:
tensor([[2.0000e+04, 2.9921e+03, 9.4488e+02, 9.4488e+02],
[2.0000e+06, 1.5748e+04, 2.0000e+06, 2.0000e+06],
[2.0000e+04, 2.9921e+03, 9.4488e+02, 9.4488e+02]], size=(3, 4),
dtype=torch.quint8, quantization_scheme=torch.per_channel_affine,
scale=tensor([ 157.4803, 15748.0234, 157.4803], dtype=torch.float64),
zero_point=tensor([128, 128, 128]), axis=0)
tensor([[255, 147, 134, 134],
[255, 129, 255, 255],
[255, 147, 134, 134]], dtype=torch.uint8)
如果在 pytorch 中像这样按通道量化,你只能在完整张量上应用 .dequantize()
而不是切片,这对嵌入来说不是一件好事,但你可以手动完成,非常容易使用repr_int
、q_per_channel_zero_points
和 q_per_channel_scales
。
这是否回答了您的问题?