tf.keras model.predict 每次提供不同的值
tf.keras model.predict each time provides different values
每次我运行:
y_true = np.argmax(tf.concat([y for x, y in train_ds], axis=0), axis=1)
y_pred = np.argmax(model.predict(train_ds), axis=1)
confusion_matrix(y_true, y_pred)
每次的结果都和我理解的不一样:
y_pred = np.argmax(model.predict(train_ds), axis=1)
每次都不一样。
澄清:我 运行 单元格 1(训练)一次。和单元格 2(推理)几次。
为什么?
代码:
单元格 1(jupyter)
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, experimental
from tensorflow.keras.layers import MaxPool2D, Flatten, Dense
from tensorflow.keras import Model
from tensorflow.keras.losses import categorical_crossentropy
from sklearn.metrics import accuracy_score
image_size = (100, 100)
batch_size = 32
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
directory,
label_mode='categorical',
validation_split=0.2,
subset="training",
seed=1337,
color_mode="grayscale",
image_size=image_size,
batch_size=batch_size,
)
inputs = Input(shape =(100,100,1))
x = experimental.preprocessing.Rescaling(1./255)(inputs)
x = Conv2D (filters =4, kernel_size =3, padding ='same', activation='relu')(x)
x = Conv2D (filters =4, kernel_size =3, padding ='same', activation='relu')(x)
x = MaxPool2D(pool_size =2, strides =2, padding ='same')(x)
x = Conv2D (filters =8, kernel_size =3, padding ='same', activation='relu')(x)
x = Conv2D (filters =8, kernel_size =3, padding ='same', activation='relu')(x)
x = MaxPool2D(pool_size =2, strides =2, padding ='same')(x)
x = Flatten()(x)
x = Dense(units = 4, activation ='relu')(x)
x = Dense(units = 4, activation ='relu')(x)
output = Dense(units = 5, activation ='softmax')(x)
model = Model (inputs=inputs, outputs =output)
model.compile(
optimizer=tf.keras.optimizers.Adam(1e-3),
loss=categorical_crossentropy,
metrics=["accuracy"])
model.fit(train_ds, epochs=5)
单元格 2:
print (Accuracy:)
y_pred = np.argmax(model.predict(train_ds), axis=1)
print (accuracy_score(y_true, y_pred))
y_pred = np.argmax(model.predict(train_ds), axis=1)
print (accuracy_score(y_true, y_pred))
输出:
118/118 [==============================] - 7 秒 57 毫秒/步 - 损失:0.1888 -精度:0.9398
准确度:
0.593
0.586
您确定每次 运行 代码时都不重新训练模型吗?如果模型的参数相同,则相同输入的预测结果每次都应该相同。
根据我目前的理解,上述原因是:
tf.keras.preprocessing.image_dataset_from_directory
虽然它的实例是:
type(train_ds)
tensorflow.python.data.ops.dataset_ops.BatchDataset
复制:
第一个运行:
[x for x, y in train_ds]
输出:
[<tf.Tensor: shape=(32, 100, 100, 1), dtype=float32, numpy= array([[[[157.],
[155.],
[159.],
第二个运行:
[x for x, y in train_ds]
输出:
[<tf.Tensor: shape=(32, 100, 100, 1), dtype=float32, numpy= array([[[[ 34.],
[ 36.],
[ 39.],
...,
可能的解决方案
imgs, y_true = [], []
for img, label in train_ds:
imgs.append(img)
y_true.append(label)
imgs = tf.concat(imgs, axis=0)
y_true = np.argmax(tf.concat(y_true, axis=0), axis=1)
y_pred = np.argmax(model.predict(imgs), axis=1)
print (accuracy_score(y_true, y_pred))
y_pred = np.argmax(model.predict(imgs), axis=1)
print (accuracy_score(y_true, y_pred))
输出
0.944044764
0.944044764
有没有更好的解决办法?
更新 2:
在验证数据集的情况下可能更合适的方法(这里的 train_ds
只是例如添加一个参数 Shuffle=False
)
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
directory,
label_mode='categorical',
validation_split=0.2,
subset="training",
seed=1337,
color_mode="grayscale",
image_size=image_size,
batch_size=batch_size,
<strong>Shuffle=False</strong>
)
更新 3:
如果您的 test images
位于单独的文件夹中,这可能是最好的选择。
path = 'your path to test folder'
test_generator = ImageDataGenerator().flow_from_directory(
directory=path,
class_mode='categorical',
shuffle=False,
batch_size=32,
target_size=(512, 512)
)
test_generator.reset()
这比选项 1 更好,因为它可以处理不适合内存 (RAM) 的数据集。
每次我运行:
y_true = np.argmax(tf.concat([y for x, y in train_ds], axis=0), axis=1)
y_pred = np.argmax(model.predict(train_ds), axis=1)
confusion_matrix(y_true, y_pred)
每次的结果都和我理解的不一样:
y_pred = np.argmax(model.predict(train_ds), axis=1)
每次都不一样。
澄清:我 运行 单元格 1(训练)一次。和单元格 2(推理)几次。
为什么?
代码: 单元格 1(jupyter)
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, experimental
from tensorflow.keras.layers import MaxPool2D, Flatten, Dense
from tensorflow.keras import Model
from tensorflow.keras.losses import categorical_crossentropy
from sklearn.metrics import accuracy_score
image_size = (100, 100)
batch_size = 32
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
directory,
label_mode='categorical',
validation_split=0.2,
subset="training",
seed=1337,
color_mode="grayscale",
image_size=image_size,
batch_size=batch_size,
)
inputs = Input(shape =(100,100,1))
x = experimental.preprocessing.Rescaling(1./255)(inputs)
x = Conv2D (filters =4, kernel_size =3, padding ='same', activation='relu')(x)
x = Conv2D (filters =4, kernel_size =3, padding ='same', activation='relu')(x)
x = MaxPool2D(pool_size =2, strides =2, padding ='same')(x)
x = Conv2D (filters =8, kernel_size =3, padding ='same', activation='relu')(x)
x = Conv2D (filters =8, kernel_size =3, padding ='same', activation='relu')(x)
x = MaxPool2D(pool_size =2, strides =2, padding ='same')(x)
x = Flatten()(x)
x = Dense(units = 4, activation ='relu')(x)
x = Dense(units = 4, activation ='relu')(x)
output = Dense(units = 5, activation ='softmax')(x)
model = Model (inputs=inputs, outputs =output)
model.compile(
optimizer=tf.keras.optimizers.Adam(1e-3),
loss=categorical_crossentropy,
metrics=["accuracy"])
model.fit(train_ds, epochs=5)
单元格 2:
print (Accuracy:)
y_pred = np.argmax(model.predict(train_ds), axis=1)
print (accuracy_score(y_true, y_pred))
y_pred = np.argmax(model.predict(train_ds), axis=1)
print (accuracy_score(y_true, y_pred))
输出:
118/118 [==============================] - 7 秒 57 毫秒/步 - 损失:0.1888 -精度:0.9398
准确度:
0.593
0.586
您确定每次 运行 代码时都不重新训练模型吗?如果模型的参数相同,则相同输入的预测结果每次都应该相同。
根据我目前的理解,上述原因是:
tf.keras.preprocessing.image_dataset_from_directory
虽然它的实例是:
type(train_ds)
tensorflow.python.data.ops.dataset_ops.BatchDataset
复制:
第一个运行:
[x for x, y in train_ds]
输出:
[<tf.Tensor: shape=(32, 100, 100, 1), dtype=float32, numpy= array([[[[157.],
[155.],
[159.],
第二个运行:
[x for x, y in train_ds]
输出:
[<tf.Tensor: shape=(32, 100, 100, 1), dtype=float32, numpy= array([[[[ 34.],
[ 36.],
[ 39.],
...,
可能的解决方案
imgs, y_true = [], []
for img, label in train_ds:
imgs.append(img)
y_true.append(label)
imgs = tf.concat(imgs, axis=0)
y_true = np.argmax(tf.concat(y_true, axis=0), axis=1)
y_pred = np.argmax(model.predict(imgs), axis=1)
print (accuracy_score(y_true, y_pred))
y_pred = np.argmax(model.predict(imgs), axis=1)
print (accuracy_score(y_true, y_pred))
输出
0.944044764
0.944044764
有没有更好的解决办法?
更新 2:
在验证数据集的情况下可能更合适的方法(这里的 train_ds
只是例如添加一个参数 Shuffle=False
)
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
directory,
label_mode='categorical',
validation_split=0.2,
subset="training",
seed=1337,
color_mode="grayscale",
image_size=image_size,
batch_size=batch_size,
<strong>Shuffle=False</strong>
)
更新 3:
如果您的 test images
位于单独的文件夹中,这可能是最好的选择。
path = 'your path to test folder'
test_generator = ImageDataGenerator().flow_from_directory(
directory=path,
class_mode='categorical',
shuffle=False,
batch_size=32,
target_size=(512, 512)
)
test_generator.reset()
这比选项 1 更好,因为它可以处理不适合内存 (RAM) 的数据集。