为什么我的某些图像在 Tensorflow 中被忽略,而我的图像在同一文件夹中没有被忽略?
Why some of my images are being ignored in Tensorflow when i have images not being ignored in the same folders?
我在 2 个文件夹中有数据(使用 10000 张图像进行训练,使用 1000 张图像进行验证)并且在这些文件夹中的每一个中我都有 10 个文件夹(各自的 类)。
我将所有这些数据放入数据框中以备后用。
事实证明,当我在 Tensorflow 中使用“flow_from_dataframe”时,某些文件夹中的某些图像被假定为具有无效名称,因此被忽略。
并且我尝试访问 Tensorflow 之外的任何图像,例如通过简单地打开图像,但当路径完全正确时我仍然无法访问某些文件
from PIL import Image
im = Image.open("D:\Ensino Superior\ISCTE-IUL\Mestrado em Engenharia Informatica\Tese\Testagem\TomatoLeafDisease\data\Train\Tomato___Tomato_mosaic_virus\Tomato___Tomato_mosaic_virus_original_f16eeb0f-5219-4a81-9941-351b3d9ba5fc___PSU_CG 2089.JPG_a88e521f-cec2-4755-871f-782de8192056.JPG")
im.show()
Output when trying to acess an image outside Tensorflow code
The dataframe
我已经研究并发现使用 abs 路径可以帮助并使其工作,但是即使这样一些图像也会被忽略,我该怎么做才能不忽略任何图像?
错误输出前的代码:
def create_gen():
train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
validation_split=0.2
)
test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input
)
train_images = train_generator.flow_from_dataframe(
dataframe=train_df,
x_col='Filepath',
y_col='Class',
target_size=(224, 224),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=True,
seed=0,
subset='training',
rotation_range=30, # Uncomment to use data augmentation
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest"
)
val_images = train_generator.flow_from_dataframe(
dataframe=train_df,
x_col='Filepath',
y_col='Class',
target_size=(224, 224),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=True,
seed=0,
subset='validation',
rotation_range=30, # Uncomment to use data augmentation
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest"
)
test_images = test_generator.flow_from_dataframe(
dataframe=validation_df,
x_col='Filepath',
y_col='Class',
target_size=(224, 224),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=False
)
return train_generator,test_generator,train_images
pretrained_model = tf.keras.applications.MobileNetV2(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling='avg'
train_generator,test_generator,train_images,val_images,test_images = create_gen()
Output after using "flow_from_data_frame" with Tensorflow
您的生成器不正确。例如你有这个代码。
train_images = train_generator.flow_from_dataframe(
dataframe=train_df,
x_col='Filepath',
y_col='Class',
target_size=(224, 224),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=True,
seed=0,
subset='training',
rotation_range=30, # Uncomment to use data augmentation
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest"
参数 rotation_range、width_shift_range、height_shift_range、shear_range、horizontal_flip 和 fill_mode 不属于 .flow_from_dataframe.它们应该放在 ImageDataGenerator 中,如下所示
train_generator = tf.keras.preprocessing.image.ImageDataGenerator( preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
validation_split=0.2, rotation_range=30, zoom_range=0.15,
width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
horizontal_flip=True, fill_mode="nearest" )
对于图像路径的问题尝试运行这个。我使用 sdir 作为包含您的数据的目录,只需替换正确的目录名称即可。
import os
from PIL import Image
sdir=r'C:\Temp\balls\train' # set this to you directory
good_image_count=0
bad_image_count=0
processed_count=0
bad_image_list=[]
classlist=os.listdir(sdir)
for klass in classlist:
print ('processing class directory ', klass)
classpath=os.path.join(sdir, klass)
if os.path.isdir(classpath):
flist=os.listdir(classpath)
print (' number of files in class directory ', klass,' is ', len(flist))
for f in flist:
processed_count +=1
fpath=os.path.join(classpath, f)
if os.path.isfile(fpath):
try:
image=Image.open(fpath)
image.verify()
good_image_count +=1
except:
bad_image_count +=1
bad_image_list.append(fpath)
else:
print('in class directory ', klass, ' you have sub directories, you should only have files in it')
else:
print ('In the sdir you have files, you should only have class subdirectories in sdir')
print(processed_count, ' files were processed')
print(good_image_count, ' files were valid image files')
print(bad_image_count, ' files were invalid images or paths did not exist')
for f in bad_image_list:
print (f)
整理完毕。这是因为图片路径和图片名称本身太长了。
我在 2 个文件夹中有数据(使用 10000 张图像进行训练,使用 1000 张图像进行验证)并且在这些文件夹中的每一个中我都有 10 个文件夹(各自的 类)。 我将所有这些数据放入数据框中以备后用。 事实证明,当我在 Tensorflow 中使用“flow_from_dataframe”时,某些文件夹中的某些图像被假定为具有无效名称,因此被忽略。
并且我尝试访问 Tensorflow 之外的任何图像,例如通过简单地打开图像,但当路径完全正确时我仍然无法访问某些文件
from PIL import Image
im = Image.open("D:\Ensino Superior\ISCTE-IUL\Mestrado em Engenharia Informatica\Tese\Testagem\TomatoLeafDisease\data\Train\Tomato___Tomato_mosaic_virus\Tomato___Tomato_mosaic_virus_original_f16eeb0f-5219-4a81-9941-351b3d9ba5fc___PSU_CG 2089.JPG_a88e521f-cec2-4755-871f-782de8192056.JPG")
im.show()
Output when trying to acess an image outside Tensorflow code
The dataframe
我已经研究并发现使用 abs 路径可以帮助并使其工作,但是即使这样一些图像也会被忽略,我该怎么做才能不忽略任何图像?
错误输出前的代码:
def create_gen():
train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
validation_split=0.2
)
test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input
)
train_images = train_generator.flow_from_dataframe(
dataframe=train_df,
x_col='Filepath',
y_col='Class',
target_size=(224, 224),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=True,
seed=0,
subset='training',
rotation_range=30, # Uncomment to use data augmentation
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest"
)
val_images = train_generator.flow_from_dataframe(
dataframe=train_df,
x_col='Filepath',
y_col='Class',
target_size=(224, 224),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=True,
seed=0,
subset='validation',
rotation_range=30, # Uncomment to use data augmentation
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest"
)
test_images = test_generator.flow_from_dataframe(
dataframe=validation_df,
x_col='Filepath',
y_col='Class',
target_size=(224, 224),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=False
)
return train_generator,test_generator,train_images
pretrained_model = tf.keras.applications.MobileNetV2(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling='avg'
train_generator,test_generator,train_images,val_images,test_images = create_gen()
Output after using "flow_from_data_frame" with Tensorflow
您的生成器不正确。例如你有这个代码。
train_images = train_generator.flow_from_dataframe(
dataframe=train_df,
x_col='Filepath',
y_col='Class',
target_size=(224, 224),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=True,
seed=0,
subset='training',
rotation_range=30, # Uncomment to use data augmentation
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest"
参数 rotation_range、width_shift_range、height_shift_range、shear_range、horizontal_flip 和 fill_mode 不属于 .flow_from_dataframe.它们应该放在 ImageDataGenerator 中,如下所示
train_generator = tf.keras.preprocessing.image.ImageDataGenerator( preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
validation_split=0.2, rotation_range=30, zoom_range=0.15,
width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
horizontal_flip=True, fill_mode="nearest" )
对于图像路径的问题尝试运行这个。我使用 sdir 作为包含您的数据的目录,只需替换正确的目录名称即可。
import os
from PIL import Image
sdir=r'C:\Temp\balls\train' # set this to you directory
good_image_count=0
bad_image_count=0
processed_count=0
bad_image_list=[]
classlist=os.listdir(sdir)
for klass in classlist:
print ('processing class directory ', klass)
classpath=os.path.join(sdir, klass)
if os.path.isdir(classpath):
flist=os.listdir(classpath)
print (' number of files in class directory ', klass,' is ', len(flist))
for f in flist:
processed_count +=1
fpath=os.path.join(classpath, f)
if os.path.isfile(fpath):
try:
image=Image.open(fpath)
image.verify()
good_image_count +=1
except:
bad_image_count +=1
bad_image_list.append(fpath)
else:
print('in class directory ', klass, ' you have sub directories, you should only have files in it')
else:
print ('In the sdir you have files, you should only have class subdirectories in sdir')
print(processed_count, ' files were processed')
print(good_image_count, ' files were valid image files')
print(bad_image_count, ' files were invalid images or paths did not exist')
for f in bad_image_list:
print (f)
整理完毕。这是因为图片路径和图片名称本身太长了。