正确使用来自 tfds.load() 的 Cifar-10 数据集
Using Cifar-10 dataset from tfds.load() correctly
我正在尝试使用 Cifar-10 数据集来练习我的 CNN 技能。
如果我这样做就可以了:
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
但我正在尝试使用 tfds.load()
但我不知道该怎么做。
有了这个我就下载了,
train_ds, test_ds = tfds.load('cifar10', split=['train','test'])
现在我尝试了这个但是没有用,
assert isinstance(train_ds, tf.data.Dataset)
assert isinstance(test_ds, tf.data.Dataset)
(train_images, train_labels) = tuple(zip(*train_ds))
(test_images, test_labels) = tuple(zip(*test_ds))
有人可以告诉我实现它的方法吗?
谢谢!
您也可以像这样提取它们:
train_ds, test_ds = tfds.load('cifar10', split=['train','test'],
as_supervised = True,
batch_size = -1)
要使用 as_numpy()
方法,您需要传递 as_supervised
和 batch_size
,如图所示。如果你传递 as_supervised = True
那么数据集将具有元组结构(输入,标签)否则它将是一个字典。
有了它们,您只需调用:
train_images, train_labels = tfds.as_numpy(train_ds)
或者另一种方法是对其进行迭代以获得类(假设未通过batch_size
)。
与as_supervised = False
:
train_images, train_labels = [],[]
for images_labels in train_ds:
train_images.append(images_labels['image'])
train_labels.append(images_labels['label'])
与as_supervised = True
:
for images, labels in train_ds:
train_images.append(images)
train_labels.append(labels)
您可以按如下方式进行。
import tensorflow as tf
import tensorflow_datasets as tfds
train_ds, test_ds = tfds.load('cifar10', split=['train','test'], as_supervised=True)
这些 train_ds
和 test_ds
是 tf.data.Dataset
对象,因此您可以使用 map
、batch
和类似的功能。
def normalize_resize(image, label):
image = tf.cast(image, tf.float32)
image = tf.divide(image, 255)
image = tf.image.resize(image, (28, 28))
return image, label
def augment(image, label):
image = tf.image.random_flip_left_right(image)
image = tf.image.random_saturation(image, 0.7, 1.3)
image = tf.image.random_contrast(image, 0.8, 1.2)
image = tf.image.random_brightness(image, 0.1)
return image, label
train = train_ds.map(normalize_resize).cache().map(augment).shuffle(100).batch(64).repeat()
test = test_ds.map(normalize_resize).cache().batch(64)
现在,我们可以将 train
和 test
直接传递给 model.fit
。
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28, 3)),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation="softmax"),
]
)
model.compile(
optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)
model.fit(
train,
epochs=5,
steps_per_epoch=60000 // 64,
validation_data=test, verbose=2
)
Epoch 1/5
17s 17ms/step - loss: 2.0848 - accuracy: 0.2318 - val_loss: 1.8175 - val_accuracy: 0.3411
Epoch 2/5
11s 12ms/step - loss: 1.8827 - accuracy: 0.3144 - val_loss: 1.7800 - val_accuracy: 0.3595
Epoch 3/5
11s 12ms/step - loss: 1.8383 - accuracy: 0.3272 - val_loss: 1.7152 - val_accuracy: 0.3904
Epoch 4/5
11s 11ms/step - loss: 1.8129 - accuracy: 0.3397 - val_loss: 1.6908 - val_accuracy: 0.4060
Epoch 5/5
11s 11ms/step - loss: 1.8022 - accuracy: 0.3461 - val_loss: 1.6801 - val_accuracy: 0.4081
我正在尝试使用 Cifar-10 数据集来练习我的 CNN 技能。
如果我这样做就可以了:
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
但我正在尝试使用 tfds.load()
但我不知道该怎么做。
有了这个我就下载了,
train_ds, test_ds = tfds.load('cifar10', split=['train','test'])
现在我尝试了这个但是没有用,
assert isinstance(train_ds, tf.data.Dataset)
assert isinstance(test_ds, tf.data.Dataset)
(train_images, train_labels) = tuple(zip(*train_ds))
(test_images, test_labels) = tuple(zip(*test_ds))
有人可以告诉我实现它的方法吗?
谢谢!
您也可以像这样提取它们:
train_ds, test_ds = tfds.load('cifar10', split=['train','test'],
as_supervised = True,
batch_size = -1)
要使用 as_numpy()
方法,您需要传递 as_supervised
和 batch_size
,如图所示。如果你传递 as_supervised = True
那么数据集将具有元组结构(输入,标签)否则它将是一个字典。
有了它们,您只需调用:
train_images, train_labels = tfds.as_numpy(train_ds)
或者另一种方法是对其进行迭代以获得类(假设未通过batch_size
)。
与as_supervised = False
:
train_images, train_labels = [],[]
for images_labels in train_ds:
train_images.append(images_labels['image'])
train_labels.append(images_labels['label'])
与as_supervised = True
:
for images, labels in train_ds:
train_images.append(images)
train_labels.append(labels)
您可以按如下方式进行。
import tensorflow as tf
import tensorflow_datasets as tfds
train_ds, test_ds = tfds.load('cifar10', split=['train','test'], as_supervised=True)
这些 train_ds
和 test_ds
是 tf.data.Dataset
对象,因此您可以使用 map
、batch
和类似的功能。
def normalize_resize(image, label):
image = tf.cast(image, tf.float32)
image = tf.divide(image, 255)
image = tf.image.resize(image, (28, 28))
return image, label
def augment(image, label):
image = tf.image.random_flip_left_right(image)
image = tf.image.random_saturation(image, 0.7, 1.3)
image = tf.image.random_contrast(image, 0.8, 1.2)
image = tf.image.random_brightness(image, 0.1)
return image, label
train = train_ds.map(normalize_resize).cache().map(augment).shuffle(100).batch(64).repeat()
test = test_ds.map(normalize_resize).cache().batch(64)
现在,我们可以将 train
和 test
直接传递给 model.fit
。
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28, 3)),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation="softmax"),
]
)
model.compile(
optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)
model.fit(
train,
epochs=5,
steps_per_epoch=60000 // 64,
validation_data=test, verbose=2
)
Epoch 1/5
17s 17ms/step - loss: 2.0848 - accuracy: 0.2318 - val_loss: 1.8175 - val_accuracy: 0.3411
Epoch 2/5
11s 12ms/step - loss: 1.8827 - accuracy: 0.3144 - val_loss: 1.7800 - val_accuracy: 0.3595
Epoch 3/5
11s 12ms/step - loss: 1.8383 - accuracy: 0.3272 - val_loss: 1.7152 - val_accuracy: 0.3904
Epoch 4/5
11s 11ms/step - loss: 1.8129 - accuracy: 0.3397 - val_loss: 1.6908 - val_accuracy: 0.4060
Epoch 5/5
11s 11ms/step - loss: 1.8022 - accuracy: 0.3461 - val_loss: 1.6801 - val_accuracy: 0.4081