使用具有有限堆的 azure-sdk-for-java 上传大文件
Upload of large files using azure-sdk-for-java with limited heap
我们正在开发需要使用 Azure 作为文件内容存储的文档微服务。 Azure Block Blob 似乎是一个合理的选择。文档服务的堆限制为 512MB (-Xmx512m
)。
使用 azure-storage-blob:12.10.0-beta.1
(也在 12.9.0
上测试过),我无法通过有限的堆成功上传流式文件。
尝试了以下方法:
- 使用
BlockBlobClient
从 documentation 复制粘贴
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();
File file = new File("file");
try (InputStream dataStream = new FileInputStream(file)) {
blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}
结果: java.io.IOException: mark/reset not supported
- SDK 尝试使用 mark/reset,即使文件输入流报告此功能不受支持。
- 添加
BufferedInputStream
以缓解 mark/reset 问题(根据 advice):
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();
File file = new File("file");
try (InputStream dataStream = new BufferedInputStream(new FileInputStream(file))) {
blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}
结果:java.lang.OutOfMemoryError: Java heap space
。我假设 SDK 试图将所有 1.17GB 的文件内容加载到内存中。
- 将
BlockBlobClient
替换为 BlobClient
并移除堆大小限制 (-Xmx512m
):
BlobClient blobClient = blobContainerClient.getBlobClient("file");
File file = new File("file");
try (InputStream dataStream = new FileInputStream(file)) {
blobClient.upload(dataStream, file.length(), true /* overwrite file */);
}
结果:使用了 1.5GB 堆内存,所有文件内容都加载到内存中 + Reactor 端的一些缓冲
Heap usage from VisualVM
- 通过
BlobOutputStream
切换到流式传输:
long blockSize = DataSize.ofMegabytes(4L).toBytes();
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();
// create / erase blob
blockBlobClient.commitBlockList(List.of(), true);
BlockBlobOutputStreamOptions options = (new BlockBlobOutputStreamOptions()).setParallelTransferOptions(
(new ParallelTransferOptions()).setBlockSizeLong(blockSize).setMaxConcurrency(1).setMaxSingleUploadSizeLong(blockSize));
try (InputStream is = new FileInputStream("file")) {
try (OutputStream os = blockBlobClient.getBlobOutputStream(options)) {
IOUtils.copy(is, os); // uses 8KB buffer
}
}
结果:文件在上传过程中损坏。 Azure 门户网站显示 1.09GB 而不是预期的 1.17GB。从 Azure 门户网站手动下载文件可确认文件内容在上传过程中已损坏。内存占用显着减少,但文件损坏是一个问题。
问题:无法提出内存占用小的有效上传/下载解决方案
如有任何帮助,我们将不胜感激!
请尝试使用下面的代码来处理 upload/download 大文件,我已经使用大小约为 1.1 GB 的 .zip 文件在我这边进行了测试
上传文件:
public static void uploadFilesByChunk() {
String connString = "<conn str>";
String containerName = "<container name>";
String blobName = "UploadOne.zip";
String filePath = "D:/temp/" + blobName;
BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
long blockSize = 2 * 1024 * 1024; //2MB
ParallelTransferOptions parallelTransferOptions = new ParallelTransferOptions()
.setBlockSizeLong(blockSize).setMaxConcurrency(2)
.setProgressReceiver(new ProgressReceiver() {
@Override
public void reportProgress(long bytesTransferred) {
System.out.println("uploaded:" + bytesTransferred);
}
});
BlobHttpHeaders headers = new BlobHttpHeaders().setContentLanguage("en-US").setContentType("binary");
blobClient.uploadFromFile(filePath, parallelTransferOptions, headers, null, AccessTier.HOT,
new BlobRequestConditions(), Duration.ofMinutes(30));
}
内存占用:
下载文件:
public static void downLoadFilesByChunk() {
String connString = "<conn str>";
String containerName = "<container name>";
String blobName = "UploadOne.zip";
String filePath = "D:/temp/" + "DownloadOne.zip";
BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
long blockSize = 2 * 1024 * 1024;
com.azure.storage.common.ParallelTransferOptions parallelTransferOptions = new com.azure.storage.common.ParallelTransferOptions()
.setBlockSizeLong(blockSize).setMaxConcurrency(2)
.setProgressReceiver(new com.azure.storage.common.ProgressReceiver() {
@Override
public void reportProgress(long bytesTransferred) {
System.out.println("dowloaded:" + bytesTransferred);
}
});
BlobDownloadToFileOptions options = new BlobDownloadToFileOptions(filePath)
.setParallelTransferOptions(parallelTransferOptions);
blobClient.downloadToFileWithResponse(options, Duration.ofMinutes(30), null);
}
内存占用:
结果:
我们正在开发需要使用 Azure 作为文件内容存储的文档微服务。 Azure Block Blob 似乎是一个合理的选择。文档服务的堆限制为 512MB (-Xmx512m
)。
使用 azure-storage-blob:12.10.0-beta.1
(也在 12.9.0
上测试过),我无法通过有限的堆成功上传流式文件。
尝试了以下方法:
- 使用
BlockBlobClient
从 documentation 复制粘贴
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();
File file = new File("file");
try (InputStream dataStream = new FileInputStream(file)) {
blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}
结果: java.io.IOException: mark/reset not supported
- SDK 尝试使用 mark/reset,即使文件输入流报告此功能不受支持。
- 添加
BufferedInputStream
以缓解 mark/reset 问题(根据 advice):
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();
File file = new File("file");
try (InputStream dataStream = new BufferedInputStream(new FileInputStream(file))) {
blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}
结果:java.lang.OutOfMemoryError: Java heap space
。我假设 SDK 试图将所有 1.17GB 的文件内容加载到内存中。
- 将
BlockBlobClient
替换为BlobClient
并移除堆大小限制 (-Xmx512m
):
BlobClient blobClient = blobContainerClient.getBlobClient("file");
File file = new File("file");
try (InputStream dataStream = new FileInputStream(file)) {
blobClient.upload(dataStream, file.length(), true /* overwrite file */);
}
结果:使用了 1.5GB 堆内存,所有文件内容都加载到内存中 + Reactor 端的一些缓冲
Heap usage from VisualVM
- 通过
BlobOutputStream
切换到流式传输:
long blockSize = DataSize.ofMegabytes(4L).toBytes();
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();
// create / erase blob
blockBlobClient.commitBlockList(List.of(), true);
BlockBlobOutputStreamOptions options = (new BlockBlobOutputStreamOptions()).setParallelTransferOptions(
(new ParallelTransferOptions()).setBlockSizeLong(blockSize).setMaxConcurrency(1).setMaxSingleUploadSizeLong(blockSize));
try (InputStream is = new FileInputStream("file")) {
try (OutputStream os = blockBlobClient.getBlobOutputStream(options)) {
IOUtils.copy(is, os); // uses 8KB buffer
}
}
结果:文件在上传过程中损坏。 Azure 门户网站显示 1.09GB 而不是预期的 1.17GB。从 Azure 门户网站手动下载文件可确认文件内容在上传过程中已损坏。内存占用显着减少,但文件损坏是一个问题。
问题:无法提出内存占用小的有效上传/下载解决方案
如有任何帮助,我们将不胜感激!
请尝试使用下面的代码来处理 upload/download 大文件,我已经使用大小约为 1.1 GB 的 .zip 文件在我这边进行了测试
上传文件:
public static void uploadFilesByChunk() {
String connString = "<conn str>";
String containerName = "<container name>";
String blobName = "UploadOne.zip";
String filePath = "D:/temp/" + blobName;
BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
long blockSize = 2 * 1024 * 1024; //2MB
ParallelTransferOptions parallelTransferOptions = new ParallelTransferOptions()
.setBlockSizeLong(blockSize).setMaxConcurrency(2)
.setProgressReceiver(new ProgressReceiver() {
@Override
public void reportProgress(long bytesTransferred) {
System.out.println("uploaded:" + bytesTransferred);
}
});
BlobHttpHeaders headers = new BlobHttpHeaders().setContentLanguage("en-US").setContentType("binary");
blobClient.uploadFromFile(filePath, parallelTransferOptions, headers, null, AccessTier.HOT,
new BlobRequestConditions(), Duration.ofMinutes(30));
}
内存占用:
下载文件:
public static void downLoadFilesByChunk() {
String connString = "<conn str>";
String containerName = "<container name>";
String blobName = "UploadOne.zip";
String filePath = "D:/temp/" + "DownloadOne.zip";
BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
long blockSize = 2 * 1024 * 1024;
com.azure.storage.common.ParallelTransferOptions parallelTransferOptions = new com.azure.storage.common.ParallelTransferOptions()
.setBlockSizeLong(blockSize).setMaxConcurrency(2)
.setProgressReceiver(new com.azure.storage.common.ProgressReceiver() {
@Override
public void reportProgress(long bytesTransferred) {
System.out.println("dowloaded:" + bytesTransferred);
}
});
BlobDownloadToFileOptions options = new BlobDownloadToFileOptions(filePath)
.setParallelTransferOptions(parallelTransferOptions);
blobClient.downloadToFileWithResponse(options, Duration.ofMinutes(30), null);
}
内存占用:
结果: