将 tensorflow 模型从本地计算机转换到 AWS SageMaker 时读取 S3 存储桶时出现问题

Question

在 Python 中的本地计算机上进行测试时，我通常会使用以下内容来读取训练集，其中包含所有类和 files/class 的子目录：

train_path = r"C:\temp\coins\PCGS - Gold\train"

train_batches = ImageDataGenerator().flow_from_directory(train_path, target_size=(100,100), classes=['0','1',2','3' etc...], batch_size=32)

找到 4100 张图像，属于 22 类。

但在 AWS SageMaker 的 Jupyter 笔记本上，我现在从 S3 存储桶中提取文件。我尝试了以下方法：

bucket = "coinpath"

train_path = 's3://{}/{}/train'.format(bucket, "v1")   #note that the directory structure is coinpath/v1/train where coinpath is the bucket

train_batches = ImageDataGenerator().flow_from_directory(train_path, target_size=(100,100), classes=
['0','1',2','3' etc...], batch_size=32)

但我得到：** 找到属于 22 类的 0 个图像。**

正在寻找有关从 S3 中提取训练数据的正确方法的指导。

Answer 1

来自 "ImageDataGenerator.flow_from_directory() currently does not allow you to stream data directly from a GCS bucket. "

我必须先从 S3 下载图像。这也是延迟原因的最佳选择。

将 tensorflow 模型从本地计算机转换到 AWS SageMaker 时读取 S3 存储桶时出现问题

Having issues reading S3 bucket when transitioning a tensorflow model from local machine to AWS SageMaker

tensorflow

amazon-sagemaker