Azure function written in java throws FailureException: OutOfMemoryError: Java heap spaceStack while unzipping file size > 80MB

Question

我有一个用 java 编写的 Azure 函数，它将在 azure 上侦听队列消息，队列消息具有 azure blob 容器上的 zip 文件的路径，一旦收到队列消息，它就会获取 zip 文件从 azure 上的路径位置解压缩到 azure 上的容器。它适用于小文件，但大于 80 MB 时显示 FailureException: OutOfMemoryError: Java heap spaceStack 异常。我的代码如下

@FunctionName("queueprocessor")
public void run(@QueueTrigger(name = "msg",
                              queueName = "queuetest",
                              dataType = "",
                              connection = "AzureWebJobsStorage") Details message,
                final ExecutionContext executionContext,
                @BlobInput(name = "file", 
                           dataType = "binary", 
                           connection = "AzureWebJobsStorage",
                           path = "{Path}") byte[] content) {

  executionContext.getLogger().info("PATH: " + message.getPath());

  CloudStorageAccount storageAccount = null;
  CloudBlobClient blobClient = null;
  CloudBlobContainer container = null;

  try {

    String connectStr = "DefaultEndpointsProtocol=https;AccountName=name;AccountKey=mykey;EndpointSuffix=core.windows.net";

    //unique name of the container
    String containerName = "output";

    // Config to upload file size > 1MB in chunks
    int deltaBackoff = 2;
    int maxAttempts = 2;
    BlobRequestOptions blobReqOption = new BlobRequestOptions();
    blobReqOption.setSingleBlobPutThresholdInBytes(1024 * 1024); // 1MB
    blobReqOption.setRetryPolicyFactory(new RetryExponentialRetry(deltaBackoff, maxAttempts));

    // Parse the connection string and create a blob client to interact with Blob storage
    storageAccount = CloudStorageAccount.parse(connectStr);
    blobClient = storageAccount.createCloudBlobClient();
    blobClient.setDefaultRequestOptions(blobReqOption);
    container = blobClient.getContainerReference(containerName);

    container.createIfNotExists(BlobContainerPublicAccessType.CONTAINER, new BlobRequestOptions(), new OperationContext());

    ZipInputStream zipIn = new ZipInputStream(new ByteArrayInputStream(content));

    ZipEntry zipEntry = zipIn.getNextEntry();
    while (zipEntry != null) {
      executionContext.getLogger().info("ZipEntry name: " + zipEntry.getName());

      //Getting a blob reference
      CloudBlockBlob blob = container.getBlockBlobReference(zipEntry.getName());

      ByteArrayOutputStream outputB = new ByteArrayOutputStream();
      byte[] buf = new byte[1024];
      int n;
      while ((n = zipIn.read(buf, 0, 1024)) != -1) {
        outputB.write(buf, 0, n);
      }

      // Upload to container
      ByteArrayInputStream inputS = new ByteArrayInputStream(outputB.toByteArray());

      blob.setStreamWriteSizeInBytes(256 * 1024); // 256K

      blob.upload(inputS, inputS.available());

      executionContext.getLogger().info("ZipEntry name: " + zipEntry.getName() + " extracted");
      zipIn.closeEntry();
      zipEntry = zipIn.getNextEntry();
    }
    zipIn.close();

    executionContext.getLogger().info("FILE EXTRACTION FINISHED");

  } catch(Exception e) {
    e.printStackTrace();
  }
}

Details message 有 ID 和文件路径，路径作为 @BlobInput(..., path ={Path},...) 的输入给出。根据我的分析，我觉得 @BlobInput 正在将整个文件加载到内存中，这就是我得到 OutOfMemoryError 的原因。如果我是对的，请告诉我任何其他避免它的方法？因为将来文件大小可能会达到 2GB。如果解压缩代码有任何错误，请告诉我。谢谢。

Answer 1

我将@JoachimSauer 的建议总结如下。

当我们使用 Azure Function blob 存储绑定来处理 java 函数应用程序中的 Azure blob 内容时，它会将整个内容保存在内存中。用它来处理大文件，我们可能会面临 OutOfMemoryError。所以如果我们想处理大尺寸的 azure blob，我们应该使用 blob sdk 打开一个输入流，然后用流处理内容。

例如

SDK

    <dependency>
          <groupId>com.azure</groupId>
          <artifactId>azure-storage-blob</artifactId>
          <version>12.9.0</version>
      </dependency>

代码

 String accountName="";
        String accountKey="";
        StorageSharedKeyCredential sharedKeyCredential =
                new StorageSharedKeyCredential(accountName, accountKey);

        BlobServiceClient blobServiceClient = new BlobServiceClientBuilder()
                               .credential(sharedKeyCredential)
                               .endpoint("https://" + accountName + ".blob.core.windows.net")
                               .buildClient();
        BlobContainerClient desContainerClient = blobServiceClient.getBlobContainerClient("output");
        BlobContainerClient sourceContainerClient = blobServiceClient.getBlobContainerClient("upload");
        BlobInputStreamOptions option = new BlobInputStreamOptions();
        //The size of each data chunk returned from the service
        option.setBlockSize(1024*1024);
        ZipInputStream zipInput = null;
        try {

            zipInput= new ZipInputStream( sourceContainerClient.getBlobClient("<read file name deom queue message>").openInputStream(option));
            ZipEntry zipEntry= zipInput.getNextEntry();
            while(zipEntry != null){
                System.out.println("ZipEntry name: " + zipEntry.getName());
                BlobOutputStream outputB = desContainerClient.getBlobClient(zipEntry.getName()).getBlockBlobClient().getBlobOutputStream();
                byte[] bytesIn = new byte[1024*1024];
                int read = 0;
                while ((read = zipInput.read(bytesIn)) != -1) {
                    outputB.write(bytesIn, 0, read);
                }
                outputB.flush();
                outputB.close();
                zipInput.closeEntry();
                zipEntry =zipInput.getNextEntry();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                zipInput.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

详情请参考here。

Azure function written in java throws FailureException: OutOfMemoryError: Java heap spaceStack while unzipping file size > 80MB

Azure function written in java throws FailureException: OutOfMemoryError: Java heap spaceStack while unzipping file size > 80MB

java

unzip

azure-blob-storage

azure-function-app

azure-blob-trigger