如何让 list_blobs 表现得像 gsutil
How to get list_blobs to behave like gsutil
我只想获取 GCS 上的第一级假文件夹结构。
如果我 运行 例如:
gsutil ls 'gs://gcp-public-data-sentinel-2/tiles/'
我得到这样的列表:
gs://gcp-public-data-sentinel-2/tiles/01/
gs://gcp-public-data-sentinel-2/tiles/02/
gs://gcp-public-data-sentinel-2/tiles/03/
gs://gcp-public-data-sentinel-2/tiles/04/
gs://gcp-public-data-sentinel-2/tiles/05/
gs://gcp-public-data-sentinel-2/tiles/06/
gs://gcp-public-data-sentinel-2/tiles/07/
gs://gcp-public-data-sentinel-2/tiles/08/
gs://gcp-public-data-sentinel-2/tiles/09/
gs://gcp-public-data-sentinel-2/tiles/10/
gs://gcp-public-data-sentinel-2/tiles/11/
gs://gcp-public-data-sentinel-2/tiles/12/
gs://gcp-public-data-sentinel-2/tiles/13/
gs://gcp-public-data-sentinel-2/tiles/14/
gs://gcp-public-data-sentinel-2/tiles/15/
.
.
.
运行 Python API 中的代码给我一个空结果:
from google.cloud import storage
bucket_name = 'gcp-public-data-sentinel-2'
prefix = 'tiles/'
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
for blob in bucket.list_blobs(max_results=10, prefix=prefix,
delimiter='/'):
print blob.name
如果我不使用 delimiter
选项,我会得到桶中的所有结果,这不是很有用。
也许不是最好的方法,但是,受到 this comment on the official repository 的启发:
iterator = bucket.list_blobs(delimiter='/', prefix=prefix)
response = iterator._get_next_page_response()
for prefix in response['prefixes']:
print('gs://'+bucket_name+'/'+prefix)
给出:
gs://gcp-public-data-sentinel-2/tiles/01/
gs://gcp-public-data-sentinel-2/tiles/02/
gs://gcp-public-data-sentinel-2/tiles/03/
gs://gcp-public-data-sentinel-2/tiles/04/
gs://gcp-public-data-sentinel-2/tiles/05/
gs://gcp-public-data-sentinel-2/tiles/06/
gs://gcp-public-data-sentinel-2/tiles/07/
gs://gcp-public-data-sentinel-2/tiles/08/
gs://gcp-public-data-sentinel-2/tiles/09/
gs://gcp-public-data-sentinel-2/tiles/10/
...
我只想获取 GCS 上的第一级假文件夹结构。
如果我 运行 例如:
gsutil ls 'gs://gcp-public-data-sentinel-2/tiles/'
我得到这样的列表:
gs://gcp-public-data-sentinel-2/tiles/01/
gs://gcp-public-data-sentinel-2/tiles/02/
gs://gcp-public-data-sentinel-2/tiles/03/
gs://gcp-public-data-sentinel-2/tiles/04/
gs://gcp-public-data-sentinel-2/tiles/05/
gs://gcp-public-data-sentinel-2/tiles/06/
gs://gcp-public-data-sentinel-2/tiles/07/
gs://gcp-public-data-sentinel-2/tiles/08/
gs://gcp-public-data-sentinel-2/tiles/09/
gs://gcp-public-data-sentinel-2/tiles/10/
gs://gcp-public-data-sentinel-2/tiles/11/
gs://gcp-public-data-sentinel-2/tiles/12/
gs://gcp-public-data-sentinel-2/tiles/13/
gs://gcp-public-data-sentinel-2/tiles/14/
gs://gcp-public-data-sentinel-2/tiles/15/
.
.
.
运行 Python API 中的代码给我一个空结果:
from google.cloud import storage
bucket_name = 'gcp-public-data-sentinel-2'
prefix = 'tiles/'
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
for blob in bucket.list_blobs(max_results=10, prefix=prefix,
delimiter='/'):
print blob.name
如果我不使用 delimiter
选项,我会得到桶中的所有结果,这不是很有用。
也许不是最好的方法,但是,受到 this comment on the official repository 的启发:
iterator = bucket.list_blobs(delimiter='/', prefix=prefix)
response = iterator._get_next_page_response()
for prefix in response['prefixes']:
print('gs://'+bucket_name+'/'+prefix)
给出:
gs://gcp-public-data-sentinel-2/tiles/01/
gs://gcp-public-data-sentinel-2/tiles/02/
gs://gcp-public-data-sentinel-2/tiles/03/
gs://gcp-public-data-sentinel-2/tiles/04/
gs://gcp-public-data-sentinel-2/tiles/05/
gs://gcp-public-data-sentinel-2/tiles/06/
gs://gcp-public-data-sentinel-2/tiles/07/
gs://gcp-public-data-sentinel-2/tiles/08/
gs://gcp-public-data-sentinel-2/tiles/09/
gs://gcp-public-data-sentinel-2/tiles/10/
...