当数据位于每个 class 的不同目录中时,如何使用 ImageDataGenerator 将数据拆分为 3 折(训练、验证、测试)
how to Split data in 3 folds (train,validation,test) using ImageDataGenerator when data is in different directories of each class
如何使用 Keras 的 ImageDataGenerator
将我的数据分成 3 份? ImageDataGenerator
只给出 validation_split
参数,所以如果我使用它,我将不会有我的测试集供以后使用。
我的数据格式为
>input_data_dir
>class_1_dir
> image_1.png
> image_2.png
> class_2_dir
> class_3_dir
正如您正确提到的那样,使用 Keras ImageDataGenerator
在一行代码中不可能将数据分成 3 份。
解决方法是将对应于 Test Data
的图像存储在单独的文件夹中并应用 ImageDataGenerator
,如下所示:
# Path to Training Directory
train_dir = 'Dogs_Vs_Cats_Small/train'
# Path to Test Directory
test_dir = 'Dogs_Vs_Cats_Small/test'
Train_Gen = ImageDataGenerator(1./255)
Test_Gen = ImageDataGenerator(1./255)
Train_Generator = Train_Gen.flow_from_directory(train_dir, target_size = (150,150), batch_size = 20, class_mode = 'binary')
Test_Generator = Test_Gen.flow_from_directory(test_dir, target_size = (150, 150), class_mode = 'binary', batch_size = 20)
从原始目录中提取一些图像并将它们放在两个单独的文件夹 train
和 test
中的示例代码,可能对您有所帮助,如下所示:
import os, shutil
# Path to the directory where the original dataset was uncompressed
original_dataset_dir = 'Dogs_Vs_Cats'
# Directory where you’ll store your smaller dataset
base_dir = 'Dogs_Vs_Cats_Small2'
os.mkdir(base_dir)
# Directory for the training splits
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
# Directory for the test splits
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)
# Directory with training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)
# Directory with training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)
# Directory with Test Cat Pictures
test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)
# Directory with Test Dog Pictures
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)
# Copies the first 1,000 cat images to train_cats_dir.
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, 'train', fname)
dst = os.path.join(train_cats_dir, fname)
shutil.copyfile(src, dst)
# Copies the next 500 cat images to test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, 'train', fname)
dst = os.path.join(test_cats_dir, fname)
shutil.copyfile(src, dst)
# Copies the first 1,000 dog images to train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, 'train', fname)
dst = os.path.join(train_dogs_dir, fname)
shutil.copyfile(src, dst)
# Copies the next 500 dog images to test_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, 'train', fname)
dst = os.path.join(test_dogs_dir, fname)
shutil.copyfile(src, dst)
# Sanity Check to ensure that Training, Validation and Test Folders have the expected number of images
print('Number of Cat Images in Training Directory is {}'.format(len(os.listdir(train_cats_dir))))
print('Number of Dog Images in Training Directory is {}'.format(len(os.listdir(train_dogs_dir))))
print('Number of Cat Images in Testing Directory is {}'.format(len(os.listdir(test_cats_dir))))
print('Number of Dog Images in Testing Directory is {}'.format(len(os.listdir(test_dogs_dir))))
希望这对您有所帮助。
更好的替代方法是使用拆分文件夹库。它将为您创建训练集、验证集和测试集文件夹。
来源 - How to split folder of images into test/training/validation sets with stratified sampling?
如何使用 Keras 的 ImageDataGenerator
将我的数据分成 3 份? ImageDataGenerator
只给出 validation_split
参数,所以如果我使用它,我将不会有我的测试集供以后使用。
我的数据格式为
>input_data_dir
>class_1_dir
> image_1.png
> image_2.png
> class_2_dir
> class_3_dir
正如您正确提到的那样,使用 Keras ImageDataGenerator
在一行代码中不可能将数据分成 3 份。
解决方法是将对应于 Test Data
的图像存储在单独的文件夹中并应用 ImageDataGenerator
,如下所示:
# Path to Training Directory
train_dir = 'Dogs_Vs_Cats_Small/train'
# Path to Test Directory
test_dir = 'Dogs_Vs_Cats_Small/test'
Train_Gen = ImageDataGenerator(1./255)
Test_Gen = ImageDataGenerator(1./255)
Train_Generator = Train_Gen.flow_from_directory(train_dir, target_size = (150,150), batch_size = 20, class_mode = 'binary')
Test_Generator = Test_Gen.flow_from_directory(test_dir, target_size = (150, 150), class_mode = 'binary', batch_size = 20)
从原始目录中提取一些图像并将它们放在两个单独的文件夹 train
和 test
中的示例代码,可能对您有所帮助,如下所示:
import os, shutil
# Path to the directory where the original dataset was uncompressed
original_dataset_dir = 'Dogs_Vs_Cats'
# Directory where you’ll store your smaller dataset
base_dir = 'Dogs_Vs_Cats_Small2'
os.mkdir(base_dir)
# Directory for the training splits
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
# Directory for the test splits
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)
# Directory with training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)
# Directory with training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)
# Directory with Test Cat Pictures
test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)
# Directory with Test Dog Pictures
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)
# Copies the first 1,000 cat images to train_cats_dir.
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, 'train', fname)
dst = os.path.join(train_cats_dir, fname)
shutil.copyfile(src, dst)
# Copies the next 500 cat images to test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, 'train', fname)
dst = os.path.join(test_cats_dir, fname)
shutil.copyfile(src, dst)
# Copies the first 1,000 dog images to train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, 'train', fname)
dst = os.path.join(train_dogs_dir, fname)
shutil.copyfile(src, dst)
# Copies the next 500 dog images to test_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, 'train', fname)
dst = os.path.join(test_dogs_dir, fname)
shutil.copyfile(src, dst)
# Sanity Check to ensure that Training, Validation and Test Folders have the expected number of images
print('Number of Cat Images in Training Directory is {}'.format(len(os.listdir(train_cats_dir))))
print('Number of Dog Images in Training Directory is {}'.format(len(os.listdir(train_dogs_dir))))
print('Number of Cat Images in Testing Directory is {}'.format(len(os.listdir(test_cats_dir))))
print('Number of Dog Images in Testing Directory is {}'.format(len(os.listdir(test_dogs_dir))))
希望这对您有所帮助。
更好的替代方法是使用拆分文件夹库。它将为您创建训练集、验证集和测试集文件夹。
来源 - How to split folder of images into test/training/validation sets with stratified sampling?