如何将存储桶映像从 AWS S3 读取到 Sagemaker Jupyter 实例中
How to read bucket image from AWS S3 into Sagemaker Jupyter Instance
我对 AWS 和云环境还很陌生。我是一名机器学习工程师,我计划在 AWS 环境中构建自定义 CNN,以预测给定图像是否存在 iPhone。
我做了什么:
第 1 步:
我已经为 iPhone 分类器创建了一个 S3 存储桶,其文件夹结构如下:
Iphone_Classifier > Train > Yes_iphone_images > 1000 images
> No_iphone_images > 1000 images
> Dev > Yes_iphone_images > 100 images
> No_iphone_images > 100 images
> Test > 30 random images
权限->阻止所有public访问
第 2 步:
然后我转到 Amazon Sagemaker,并创建一个实例:
我select以下
Name: some-xyz,
Type: ml.t2.medium
IAM : created new IAM role ( root access was enabled.)
others: All others were in default
然后创建并打开笔记本实例。
第 3 步:
打开实例后,
1. I used to prefer - conda_tensorflow2_p36 as interpreter
2. Created a new Jupyter notebook and stated.
3. I checked image classification examples but was confused, and most others used CSV files, but I want to retrieve images from S3 buckets.
问题:
1. How simply can we access the S3 bucket image dataset from the Jupiter Instances of Sagemaker?
2. I exactly need the reference code to access the S3 bucket images.
3. Is it a good approach to copy the data to the notebook or is it better to work from the S3 bucket.
我试过的是:
import boto3
client = boto3.client('s3')
# I tried this one and failed
#path = 's3://iphone/Train/Yes_iphone_images/100.png'
# I tried this one and failed
path = 's3://iphone/Test/10.png'
# I uploaded to the notebook instance an image file and when I try to read it works
#path = 'thiyaga.jpg'
print(path)
import cv2
from matplotlib import pyplot as plt
print(cv2.__version__)
plt.imshow(img)
如果你的图片是binary-encoded,你可以试试这个:
import boto3
import matplotlib.pyplot as plt
# Define Bucket and Key
s3_bucket, s3_key = 'YOUR_BUCKET', 'YOUR_IMAGE_KEY'
with BytesIO() as f:
boto3.client("s3").download_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
f.seek(0)
img = plt.imread(f, format='png')
在其他情况下,以下代码有效(基于documentation):
s3 = boto3.resource('s3')
img = s3.Bucket(s3_bucket).download_file(s3_key, 'local_image.jpg')
在这两种情况下,您都可以使用 plt.imshow(img)
可视化图像。
在您的路径示例 path = 's3://iphone/Test/10.png'
中,存储桶和密钥将为 s3_bucket = 'iphone'
和 s3_key=Test/10.png
其他资源:https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-download-file.html
一个简单的方法是使用 S3FS。您可以读取目录中的所有图像。例如,目录可以包含具有 iphone.
的所有图像
import s3fs
fs = s3fs.S3FileSystem()
no_iphone_images_directory = 's3://iphone_images/no_iphone_images'
filenames = fs.ls(no_iphone_images_directory)
for filename in filenames:
if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
with fs.open(filename, 'rb') as f:
# Do something with the image
我认为最方便的方法是将您的图片直接上传到笔记本所在的 space 中。 Sagemaker 附带至少 space 5G 或更多(如果您在创建实例期间指定)。首先,您可以使用 shell
:
将整个数据集(文件夹)压缩到 .tgz
文件中
tar -cvzf <name of tarball>.tgz /path/to/source/folder
然后使用您的 jupyter 实例的上传按钮进行上传。解压的下一步,运行 在笔记本的单元格中执行以下命令:
!tar -xzvf <name of tarball>.tgz
此时您应该能够通过 python 语法简单地访问您的 files/folder 例如:
path = Path("./folder_name/")
我对 AWS 和云环境还很陌生。我是一名机器学习工程师,我计划在 AWS 环境中构建自定义 CNN,以预测给定图像是否存在 iPhone。
我做了什么:
第 1 步:
我已经为 iPhone 分类器创建了一个 S3 存储桶,其文件夹结构如下:
Iphone_Classifier > Train > Yes_iphone_images > 1000 images
> No_iphone_images > 1000 images
> Dev > Yes_iphone_images > 100 images
> No_iphone_images > 100 images
> Test > 30 random images
权限->阻止所有public访问
第 2 步:
然后我转到 Amazon Sagemaker,并创建一个实例:
我select以下
Name: some-xyz,
Type: ml.t2.medium
IAM : created new IAM role ( root access was enabled.)
others: All others were in default
然后创建并打开笔记本实例。
第 3 步:
打开实例后,
1. I used to prefer - conda_tensorflow2_p36 as interpreter
2. Created a new Jupyter notebook and stated.
3. I checked image classification examples but was confused, and most others used CSV files, but I want to retrieve images from S3 buckets.
问题:
1. How simply can we access the S3 bucket image dataset from the Jupiter Instances of Sagemaker?
2. I exactly need the reference code to access the S3 bucket images.
3. Is it a good approach to copy the data to the notebook or is it better to work from the S3 bucket.
我试过的是:
import boto3
client = boto3.client('s3')
# I tried this one and failed
#path = 's3://iphone/Train/Yes_iphone_images/100.png'
# I tried this one and failed
path = 's3://iphone/Test/10.png'
# I uploaded to the notebook instance an image file and when I try to read it works
#path = 'thiyaga.jpg'
print(path)
import cv2
from matplotlib import pyplot as plt
print(cv2.__version__)
plt.imshow(img)
如果你的图片是binary-encoded,你可以试试这个:
import boto3
import matplotlib.pyplot as plt
# Define Bucket and Key
s3_bucket, s3_key = 'YOUR_BUCKET', 'YOUR_IMAGE_KEY'
with BytesIO() as f:
boto3.client("s3").download_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
f.seek(0)
img = plt.imread(f, format='png')
在其他情况下,以下代码有效(基于documentation):
s3 = boto3.resource('s3')
img = s3.Bucket(s3_bucket).download_file(s3_key, 'local_image.jpg')
在这两种情况下,您都可以使用 plt.imshow(img)
可视化图像。
在您的路径示例 path = 's3://iphone/Test/10.png'
中,存储桶和密钥将为 s3_bucket = 'iphone'
和 s3_key=Test/10.png
其他资源:https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-download-file.html
一个简单的方法是使用 S3FS。您可以读取目录中的所有图像。例如,目录可以包含具有 iphone.
的所有图像import s3fs
fs = s3fs.S3FileSystem()
no_iphone_images_directory = 's3://iphone_images/no_iphone_images'
filenames = fs.ls(no_iphone_images_directory)
for filename in filenames:
if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
with fs.open(filename, 'rb') as f:
# Do something with the image
我认为最方便的方法是将您的图片直接上传到笔记本所在的 space 中。 Sagemaker 附带至少 space 5G 或更多(如果您在创建实例期间指定)。首先,您可以使用 shell
:
.tgz
文件中
tar -cvzf <name of tarball>.tgz /path/to/source/folder
然后使用您的 jupyter 实例的上传按钮进行上传。解压的下一步,运行 在笔记本的单元格中执行以下命令:
!tar -xzvf <name of tarball>.tgz
此时您应该能够通过 python 语法简单地访问您的 files/folder 例如:
path = Path("./folder_name/")