Google 带有安全 tsv url 列表文件的云存储传输
Google Cloud Storage Transfer with a secured tsv url list file
我正在尝试传输 GCS 存储桶上的内容或public仅可用url。
为此,我使用 Google 云存储传输 api,它需要我执行两个步骤:
- 创建一个 .tsv 文件,其中包含我的 public URL.
的 列表
- 要创建传输(这里使用python api)。
为了启动脚本,我使用了一个服务帐户,该帐户对包含 transfer.tsv 文件的存储桶和接收器存储桶都具有存储对象管理员权限。
只有当 transfer.tsv 文件上传到互联网上打开的存储桶时,我才能让它工作。
您知道是否可以将它放在安全的存储桶中,并向创建传输的服务帐户授予权限吗?
到目前为止,我的所有尝试都产生了以下错误。
错误
PERMISSION_DENIED 1
https://storage.googleapis.com/my-private-bucket/transfer.tsv
Received HTTP error code 403.
transfer.tsv
TsvHttpData-1.0
https://image.shutterstock.com/image-photo/portrait-surprised-cat-scottish-straight-260nw-499196506.jpg
python 脚本
from google.cloud import storage_transfer
from datetime import datetime
def create_one_time_http_transfer(
project_id: str,
description: str,
list_url: str,
sink_bucket: str,
):
"""Creates a one-time transfer job from Amazon S3 to Google Cloud
Storage."""
client = storage_transfer.StorageTransferServiceClient()
# the same time creates a one-time transfer
one_time_schedule = {"day": now.day, "month": now.month, "year": now.year}
transfer_job_request = storage_transfer.CreateTransferJobRequest(
{
"transfer_job": {
"project_id": project_id,
"description": description,
"status": storage_transfer.TransferJob.Status.ENABLED,
"schedule": {
"schedule_start_date": one_time_schedule,
"schedule_end_date": one_time_schedule,
},
"transfer_spec": {
"http_data_source": storage_transfer.HttpData(list_url=list_url),
"gcs_data_sink": {
"bucket_name": sink_bucket,
},
},
}
}
)
result = client.create_transfer_job(transfer_job_request)
print(f"Created transferJob: {result.name}")
我调用函数
create_one_time_http_transfer(
project_id="my-project-id",
description="first transfer",
list_url=tsv_url,
sink_bucket="my-destination-bucket",
)
问题可能出在 storage_transfer.StorageTransferServiceClient()
中的权限。创建一个角色来访问存储并将其附加到服务帐户 运行 Python 脚本。或者将您的凭据放在这里 storage_transfer.StorageTransferServiceClient(credentials=XXXX.json)
找到了让它发挥作用的方法。
当我将 transfer.tsv 文件上传到存储时,我 return 签署了 url 而不是 public url
from datetime import datetime
from google.cloud import storage
def upload_to_storage(
file_input_path: str, file_output_path: str, bucket_name: str
) -> str:
gcs = storage.Client()
# # Get the bucket that the file will be uploaded to.
bucket = gcs.get_bucket(bucket_name)
# # Create a new blob and upload the file's content.
blob = bucket.blob(file_output_path)
blob.upload_from_filename(file_input_path)
return blob.generate_signed_url(datetime.now())
这个签名的url然后被传递给上面提到的create_one_time_http_transfer。
我正在尝试传输 GCS 存储桶上的内容或public仅可用url。
为此,我使用 Google 云存储传输 api,它需要我执行两个步骤:
- 创建一个 .tsv 文件,其中包含我的 public URL. 的 列表
- 要创建传输(这里使用python api)。
为了启动脚本,我使用了一个服务帐户,该帐户对包含 transfer.tsv 文件的存储桶和接收器存储桶都具有存储对象管理员权限。
只有当 transfer.tsv 文件上传到互联网上打开的存储桶时,我才能让它工作。
您知道是否可以将它放在安全的存储桶中,并向创建传输的服务帐户授予权限吗?
到目前为止,我的所有尝试都产生了以下错误。
错误
PERMISSION_DENIED 1
https://storage.googleapis.com/my-private-bucket/transfer.tsv
Received HTTP error code 403.
transfer.tsv
TsvHttpData-1.0
https://image.shutterstock.com/image-photo/portrait-surprised-cat-scottish-straight-260nw-499196506.jpg
python 脚本
from google.cloud import storage_transfer
from datetime import datetime
def create_one_time_http_transfer(
project_id: str,
description: str,
list_url: str,
sink_bucket: str,
):
"""Creates a one-time transfer job from Amazon S3 to Google Cloud
Storage."""
client = storage_transfer.StorageTransferServiceClient()
# the same time creates a one-time transfer
one_time_schedule = {"day": now.day, "month": now.month, "year": now.year}
transfer_job_request = storage_transfer.CreateTransferJobRequest(
{
"transfer_job": {
"project_id": project_id,
"description": description,
"status": storage_transfer.TransferJob.Status.ENABLED,
"schedule": {
"schedule_start_date": one_time_schedule,
"schedule_end_date": one_time_schedule,
},
"transfer_spec": {
"http_data_source": storage_transfer.HttpData(list_url=list_url),
"gcs_data_sink": {
"bucket_name": sink_bucket,
},
},
}
}
)
result = client.create_transfer_job(transfer_job_request)
print(f"Created transferJob: {result.name}")
我调用函数
create_one_time_http_transfer(
project_id="my-project-id",
description="first transfer",
list_url=tsv_url,
sink_bucket="my-destination-bucket",
)
问题可能出在 storage_transfer.StorageTransferServiceClient()
中的权限。创建一个角色来访问存储并将其附加到服务帐户 运行 Python 脚本。或者将您的凭据放在这里 storage_transfer.StorageTransferServiceClient(credentials=XXXX.json)
找到了让它发挥作用的方法。
当我将 transfer.tsv 文件上传到存储时,我 return 签署了 url 而不是 public url
from datetime import datetime
from google.cloud import storage
def upload_to_storage(
file_input_path: str, file_output_path: str, bucket_name: str
) -> str:
gcs = storage.Client()
# # Get the bucket that the file will be uploaded to.
bucket = gcs.get_bucket(bucket_name)
# # Create a new blob and upload the file's content.
blob = bucket.blob(file_output_path)
blob.upload_from_filename(file_input_path)
return blob.generate_signed_url(datetime.now())
这个签名的url然后被传递给上面提到的create_one_time_http_transfer。