TensorFlow Model 在 Post-training 量化后仍然是浮点数
TensorFlow Model is still floating point after Post-training quantization
应用 post 训练量化后,我的自定义 CNN 模型缩小到其原始大小的 1/4(从 56.1MB 到 14MB)。我将要预测的图像(100x100x3)作为100x100x3=30,000字节放入ByteBuffer。但是,我在推理过程中得到了以下错误:
java.lang.IllegalArgumentException: Cannot convert between a TensorFlowLite buffer with 120000 bytes and a ByteBuffer with 30000 bytes.**
at org.tensorflow.lite.Tensor.throwExceptionIfTypeIsIncompatible(Tensor.java:221)
at org.tensorflow.lite.Tensor.setTo(Tensor.java:93)
at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:136)
at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:216)
at org.tensorflow.lite.Interpreter.run(Interpreter.java:195)
at gov.nih.nlm.malaria_screener.imageProcessing.TFClassifier_Lite.recongnize(TFClassifier_Lite.java:102)
at gov.nih.nlm.malaria_screener.imageProcessing.TFClassifier_Lite.process_by_batch(TFClassifier_Lite.java:145)
at gov.nih.nlm.malaria_screener.Cells.runCells(Cells.java:269)
at gov.nih.nlm.malaria_screener.CameraActivity.ProcessThinSmearImage(CameraActivity.java:1020)
at gov.nih.nlm.malaria_screener.CameraActivity.access0(CameraActivity.java:75)
at gov.nih.nlm.malaria_screener.CameraActivity.run(CameraActivity.java:810)
at java.lang.Thread.run(Thread.java:762)
模型的输入图像大小为:100x100x3。我目前正在一次预测一张图像。所以,如果我正在制作字节缓冲区:100x100x3 = 30,000 字节。但是,上面的日志信息显示 TensorFlowLite 缓冲区有 120,000 字节。这让我怀疑转换后的 tflite 模型仍然是 float 格式。这是预期的行为吗?我怎样才能像 TensorFlow 官方存储库的 example 中那样获得以 8 坑精度获取输入图像的量化模型?
在示例代码中,用作 tflite.run() 输入的 ByteBuffer 是量化模型的 8 位精度。
但我也从 google 文档中读到,"At inference, weights are converted from 8-bits of precision to floating-point and computed using floating point kernels." 这两个实例似乎相互矛盾。
private static final int BATCH_SIZE = 1;
private static final int DIM_IMG_SIZE = 100;
private static final int DIM_PIXEL_SIZE = 3;
private static final int BYTE_NUM = 1;
imgData = ByteBuffer.allocateDirect(BYTE_NUM * BATCH_SIZE * DIM_IMG_SIZE * DIM_IMG_SIZE * DIM_PIXEL_SIZE);
imgData.order(ByteOrder.nativeOrder());
... ...
int pixel = 0;
for (int i = 0; i < DIM_IMG_SIZE; ++i) {
for (int j = 0; j < DIM_IMG_SIZE; ++j) {
final int val = intValues[pixel++];
imgData.put((byte)((val >> 16) & 0xFF));
imgData.put((byte)((val >> 8) & 0xFF));
imgData.put((byte)(val & 0xFF));
// imgData.putFloat(((val >> 16) & 0xFF) / 255.0f);
// imgData.putFloat(((val >> 8) & 0xFF) / 255.0f);
// imgData.putFloat((val & 0xFF) / 255.0f);
}
}
... ...
tfLite.run(imgData, labelProb);
Post-训练量化代码:
import tensorflow as tf
import sys
import os
saved_model_dir = '/home/yuh5/Downloads/malaria_thinsmear.h5.pb'
input_arrays = ["input_2"]
output_arrays = ["output_node0"]
converter = tf.contrib.lite.TocoConverter.from_frozen_graph(saved_model_dir, input_arrays, output_arrays)
converter.post_training_quantize = True
tflite_model = converter.convert()
open("thinSmear_100.tflite", "wb").write(tflite_model)
Post-训练量化不改变输入层或输出层的格式。您可以 运行 您的模型使用与训练所用格式相同的数据。
您可以研究量化感知训练来生成完全量化的模型,但我没有这方面的经验。
至于"At inference, weights are converted from 8-bits of precision to floating-point and computed using floating point kernels."这句意思是权重"de-quantized"到内存中的浮点数,用浮点指令计算,而不是整型运算。
应用 post 训练量化后,我的自定义 CNN 模型缩小到其原始大小的 1/4(从 56.1MB 到 14MB)。我将要预测的图像(100x100x3)作为100x100x3=30,000字节放入ByteBuffer。但是,我在推理过程中得到了以下错误:
java.lang.IllegalArgumentException: Cannot convert between a TensorFlowLite buffer with 120000 bytes and a ByteBuffer with 30000 bytes.**
at org.tensorflow.lite.Tensor.throwExceptionIfTypeIsIncompatible(Tensor.java:221)
at org.tensorflow.lite.Tensor.setTo(Tensor.java:93)
at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:136)
at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:216)
at org.tensorflow.lite.Interpreter.run(Interpreter.java:195)
at gov.nih.nlm.malaria_screener.imageProcessing.TFClassifier_Lite.recongnize(TFClassifier_Lite.java:102)
at gov.nih.nlm.malaria_screener.imageProcessing.TFClassifier_Lite.process_by_batch(TFClassifier_Lite.java:145)
at gov.nih.nlm.malaria_screener.Cells.runCells(Cells.java:269)
at gov.nih.nlm.malaria_screener.CameraActivity.ProcessThinSmearImage(CameraActivity.java:1020)
at gov.nih.nlm.malaria_screener.CameraActivity.access0(CameraActivity.java:75)
at gov.nih.nlm.malaria_screener.CameraActivity.run(CameraActivity.java:810)
at java.lang.Thread.run(Thread.java:762)
模型的输入图像大小为:100x100x3。我目前正在一次预测一张图像。所以,如果我正在制作字节缓冲区:100x100x3 = 30,000 字节。但是,上面的日志信息显示 TensorFlowLite 缓冲区有 120,000 字节。这让我怀疑转换后的 tflite 模型仍然是 float 格式。这是预期的行为吗?我怎样才能像 TensorFlow 官方存储库的 example 中那样获得以 8 坑精度获取输入图像的量化模型?
在示例代码中,用作 tflite.run() 输入的 ByteBuffer 是量化模型的 8 位精度。
但我也从 google 文档中读到,"At inference, weights are converted from 8-bits of precision to floating-point and computed using floating point kernels." 这两个实例似乎相互矛盾。
private static final int BATCH_SIZE = 1;
private static final int DIM_IMG_SIZE = 100;
private static final int DIM_PIXEL_SIZE = 3;
private static final int BYTE_NUM = 1;
imgData = ByteBuffer.allocateDirect(BYTE_NUM * BATCH_SIZE * DIM_IMG_SIZE * DIM_IMG_SIZE * DIM_PIXEL_SIZE);
imgData.order(ByteOrder.nativeOrder());
... ...
int pixel = 0;
for (int i = 0; i < DIM_IMG_SIZE; ++i) {
for (int j = 0; j < DIM_IMG_SIZE; ++j) {
final int val = intValues[pixel++];
imgData.put((byte)((val >> 16) & 0xFF));
imgData.put((byte)((val >> 8) & 0xFF));
imgData.put((byte)(val & 0xFF));
// imgData.putFloat(((val >> 16) & 0xFF) / 255.0f);
// imgData.putFloat(((val >> 8) & 0xFF) / 255.0f);
// imgData.putFloat((val & 0xFF) / 255.0f);
}
}
... ...
tfLite.run(imgData, labelProb);
Post-训练量化代码:
import tensorflow as tf
import sys
import os
saved_model_dir = '/home/yuh5/Downloads/malaria_thinsmear.h5.pb'
input_arrays = ["input_2"]
output_arrays = ["output_node0"]
converter = tf.contrib.lite.TocoConverter.from_frozen_graph(saved_model_dir, input_arrays, output_arrays)
converter.post_training_quantize = True
tflite_model = converter.convert()
open("thinSmear_100.tflite", "wb").write(tflite_model)
Post-训练量化不改变输入层或输出层的格式。您可以 运行 您的模型使用与训练所用格式相同的数据。
您可以研究量化感知训练来生成完全量化的模型,但我没有这方面的经验。
至于"At inference, weights are converted from 8-bits of precision to floating-point and computed using floating point kernels."这句意思是权重"de-quantized"到内存中的浮点数,用浮点指令计算,而不是整型运算。