回归网络完全整数量化后的错误率非常高

Very high error after full integer quantization of a regression network

我训练了一个全连接神经网络,其中一个隐藏层有 64 个节点。我正在使用 Medical Cost 数据集进行测试。使用原始精度模型,平均绝对误差为 0.22063259780406952。对于量化为 float16integer quantization with float fallback 的模型,原始误差与低精度模型之间的差异永远不会超过 0.1。但是,如果我这样做 full integer quantization,错误会增加到不合理的数量。在这种特殊情况下,它跳到了将近 60。我不知道这是 TensorFlow 中的错误,还是我使用 API 不正确,或者这是否是量化后的合理行为。任何帮助表示赞赏。显示转换和推理的代码如下所示:

import math
import pathlib
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import pandas as pd
from sklearn import preprocessing as pr
from sklearn.metrics import mean_absolute_error

url = 'insurance.csv'
column_names = ["age", "sex", "bmi", "children", "smoker", "region", "charges"]

dataset = pd.read_csv(url, names=column_names, header=0, na_values='?')

dataset = dataset.dropna()  # Drop rows with missing values
dataset['sex'] = dataset['sex'].map({'female': 2, 'male': 1})
dataset['smoker'] = dataset['smoker'].map({'yes': 1, 'no': 0})

dataset = pd.get_dummies(dataset, prefix='', prefix_sep='', columns=['region'])

# this is a trick to convert a dataframe to 2d array, scale it and
# convert back to dataframe
scaled_np = pr.StandardScaler().fit_transform(dataset.values)
dataset = pd.DataFrame(scaled_np, index=dataset.index, columns=dataset.columns)
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('charges')
test_labels = test_features.pop('charges')
def build_and_compile_model():
    model = keras.Sequential([
        layers.Dense(64,
                     activation='relu',
                     input_shape=(len(dataset.columns) - 1, )),
        layers.Dense(1)
    ])

    model.compile(loss='mean_absolute_error',
                  optimizer=tf.keras.optimizers.Adam(0.001))
    return model


dnn_model = build_and_compile_model()
dnn_model.summary()

dnn_model.fit(train_features,
              train_labels,
              validation_split=0.2,
              verbose=0,
              epochs=100)

print("Original error = {}".format(
    dnn_model.evaluate(test_features, test_labels, verbose=0)))
converter = tf.lite.TFLiteConverter.from_keras_model(dnn_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

def representative_data_gen():
    for input_value in tf.data.Dataset.from_tensor_slices(
            train_features.astype('float32')).batch(1).take(100):
        yield [input_value]


converter.representative_dataset = representative_data_gen

# Full Integer Quantization
# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set the input and output tensors to uint8 (APIs added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

tflite_model_quant = converter.convert()

dir_save = pathlib.Path(".")
file_save = dir_save / "model_16.tflite"
file_save.write_bytes(tflite_model_quant)
interpreter = tf.lite.Interpreter(model_path=str(file_save))
interpreter.allocate_tensors()
def evaluate_model(interpreter, test_images, test_labels):
    input_details = interpreter.get_input_details()[0]
    input_index = interpreter.get_input_details()[0]["index"]
    output_index = interpreter.get_output_details()[0]["index"]

    # Run predictions on every image in the "test" dataset.
    prediction_digits = []
    for test_image in test_images:
        if input_details['dtype'] == np.uint8:
            input_scale, input_zero_point = input_details['quantization']
            test_image = test_image / input_scale + input_zero_point

        test_image = np.expand_dims(test_image,
                                    axis=0).astype(input_details['dtype'])
        interpreter.set_tensor(input_index, test_image)

        # Run inference.
        interpreter.invoke()

        output = interpreter.get_tensor(output_index)
        prediction_digits.append(output[0])


    filtered_labels, correct_digits = map(
        list,
        zip(*[(x, y) for x, y in zip(test_labels, prediction_digits)
              if not math.isnan(y)]))
    return mean_absolute_error(filtered_labels, correct_digits)

print(evaluate_model(interpreter, test_features[:].values, test_labels))

当您进行量化(和一般的机器学习)时,您需要注意数据的外观。对您拥有的数据应用一定程度的量化是否有意义?

对于像您这样的回归问题,真实值在 [1121.8739;63770.42801] 范围内,并且一些输入数据也是浮点数,很可能是使用该数据训练模型,然后将其量化为整数不会产生好的结果。

您训练模型输出 [1121.8739;63770.42801] 范围内的值,在 int8 中量化后,它将只能输出 [-127;128] 范围内的值,不带小数点。显然,当您将量化模型的结果与您的 ground-truth 进行比较时,错误将越过屋顶。

如果还想应用量化怎么办?您需要在量化集的域中移动数据。在您的情况下,以仍然有意义的方式将 float32 数据转换为 int8。您 看到实际用例中的性能大幅下降。毕竟,对于回归问题,您从大约 2500 万 个可能输出值的域转移(假设尾数为 23 位和 8 位指数,参见 Single Precision Floating Point and How many floating-point numbers are in the interval [0,1]?) , 到具有 256 (2^8) 个可能输出的域。

但是 真的 天真的 方法可能是应用以下转换:

def scale_down_data(data):
  max_value = data.max()
  min_value = data.min()
  # normalizing between -128 and 127
  scaled_down = 255*((data-min_value)/(max_value-min_value)) -128
  return scaled_down.astype(np.int8)

在实践中,最好查看数据的分布,并进行转换以在数据更密集的地方提供更大的范围。您也不希望将回归范围限制在训练集的范围内。您需要对不在量化域中的每个输入或输出进行分析。