使用 Azure Python 函数和托管身份从存储帐户下载

Use Azure Python Function and Managed Identity to Download from Storage Account

我创建了一个名为“transformerfunction”的 Azure 函数,用 Python 编写,它应该将数据上传和下载到 Azure 数据湖/存储。我还打开了系统分配的托管标识,并在我的存储帐户中为函数授予了角色权限“Storage Blob Data Contributor”:

为了验证和下载文件,我使用这部分代码基本上遵循 these 文档:

managed_identity = ManagedIdentityCredential()
credential_chain = ChainedTokenCredential(managed_identity)
client = DataLakeServiceClient(account_url, credential=credential_chain)

file_client = client.get_file_client(file_system_container, file_name)
downloaded_file = file_client.download_file()
downloaded_file.readinto(f)

如果我的理解是正确的,Azure 应该使用 Function 的身份进行身份验证,并且由于此身份对存储具有 Storage Blob Data Contributor 权限,下载应该可以进行。

但是,当我调用该函数并查看日志时,我看到的是:

2020-11-23 20:04:11.396 Function called
2020-11-23 20:04:11.397 ManagedIdentityCredential will use App Service managed identity
2020-11-23 20:04:13.105
Result: Failure Exception: HttpResponseError: This request is not authorized to perform this operation. 
RequestId:1f6a2a1c-b01e-0090-26d3-c1d0c0000000 Time:2020-11-23T20:04:13.0679405Z ErrorCode:AuthorizationFailure Error:None Stack:
File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 357, in _handle__invocation_request self.__run_sync_func, invocation_id, fi.func, args)
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run result = self.fn(*self.args, **self.kwargs)
File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 542, in __run_sync_func return func(**params)
File "/home/site/wwwroot/shared/datalake.py", line 65, in download downloaded_file = client.download_file()
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/azure/storage/filedatalake/_data_lake_file_client.py", line 593, in download_file downloader = self._blob_client.download_blob(offset=offset, length=length, **kwargs)
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/azure/core/tracing/decorator.py", line 83, in wrapper_use_tracer return func(*args, **kwargs)
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/azure/storage/blob/_blob_client.py", line 674, in download_blob return StorageStreamDownloader(**options)
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/azure/storage/blob/_download.py", line 316, in __init__ self._response = self._initial_request()
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/azure/storage/blob/_download.py", line 403, in _initial_request process_storage_error(error)
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/azure/storage/blob/_shared/response_handlers.py", line 147, in process_storage_error raise error

这很清楚地表明函数无权下载 blob。但为什么?我需要做哪些不同的事情?

编辑:

我找到了问题的原因:我在网络设置中限制了我的 Data Lake 存储,如下所示:

我的假设是“允许受信任的 Microsoft 服务访问此存储帐户”将始终允许 Azure 上的函数 运行 访问存储,无论是否选择了网络或选择了哪些网络 - 但事实并非如此。

不确定你这边的原因,但下面的代码对我来说是完美的:

import azure.functions as func
import json
from azure.identity import ChainedTokenCredential,ManagedIdentityCredential
from azure.storage.filedatalake import DataLakeServiceClient



def main(req: func.HttpRequest) -> func.HttpResponse:
    
    MSI_credential = ManagedIdentityCredential()
    
    credential_chain = ChainedTokenCredential(MSI_credential)

    client = DataLakeServiceClient("https://<Azure Data Lake Gen2 account name>.dfs.core.windows.net", credential=credential_chain)

    file_client = client.get_file_client("container name", "filename.txt")
    stream = file_client.download_file()
 
    return func.HttpResponse(stream.readall());

为我的函数 MSI 配置:

我的测试文件内容:

测试结果: