将文件创建为流并上传到 Azure
Creating a file as a stream and uploading to Azure
我正在使用 ChoETL 和 ChoETL.Parquet 库来创建基于其他一些数据的 parquet 文件。我可以在本地创建文件。
using (ChoParquetWriter parser = new ChoParquetWriter($"..\..\..\parquet_files\{club}_events.parquet"))
{
parser.Write(events);
}
在此代码片段中,事件是包含字符串的对象列表。它们将被转换为镶木地板数据。
到目前为止,我已经编写了上传到 Azure 的代码,但它需要一个本地文件作为输入。
BlobServiceClient BlobServiceClient = new BlobServiceClient("REDACTED");
var containerClient = BlobServiceClient.GetBlobContainerClient("base-test");
BlobClient blobClient = containerClient.GetBlobClient($"Base/{RequestTime.Year}/{RequestTime.Month}/{RequestTime.Day}/{RequestTime.Hour}/{RequestTime.Minute}/events.parquet");
using FileStream uploadFileStream = File.OpenRead("..\..\..\events.parquet");
await blobClient.UploadAsync(uploadFileStream, true);
uploadFileStream.Close();
我需要在内存中创建它然后上传到 Azure blob 存储。我怎样才能做到这一点?澄清一下:我需要上传镶木地板文件。
关于这个问题,您可以使用方法BlockBlobClient.OpenWriteAsync
获取流并为ChoParquetWriter
提供流。然后writer会直接把东西写到Azure blob中。
例如
List<EmployeeRecSimple> objs = new List<EmployeeRecSimple>();
EmployeeRecSimple rec1 = new EmployeeRecSimple();
rec1.Id = 1;
rec1.Name = "Mark";
objs.Add(rec1);
EmployeeRecSimple rec2 = new EmployeeRecSimple();
rec2.Id = 2;
rec2.Name = "Jason";
objs.Add(rec2);
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
var desContainer = blobServiceClient.GetBlobContainerClient("output");
var desBlob= desContainer.GetBlockBlobClient("my.parquet");
var options = new BlockBlobOpenWriteOptions {
HttpHeaders = new BlobHttpHeaders {
ContentType = MimeMapping.GetMimeMapping("parquet"),
},
// progress updates about data transfers
ProgressHandler = new Progress<long> (
progress => Console.WriteLine("Progress: {0} bytes written", progress))
};
using (var outStream = await desBlob.OpenWriteAsync(true, options).ConfigureAwait(false))
using (ChoParquetWriter parser = new ChoParquetWriter(outStream)) {
parser.Write(objs);
}
public partial class EmployeeRecSimple
{
public int Id { get; set; }
public string Name { get; set; }
}
我正在使用 ChoETL 和 ChoETL.Parquet 库来创建基于其他一些数据的 parquet 文件。我可以在本地创建文件。
using (ChoParquetWriter parser = new ChoParquetWriter($"..\..\..\parquet_files\{club}_events.parquet"))
{
parser.Write(events);
}
在此代码片段中,事件是包含字符串的对象列表。它们将被转换为镶木地板数据。
到目前为止,我已经编写了上传到 Azure 的代码,但它需要一个本地文件作为输入。
BlobServiceClient BlobServiceClient = new BlobServiceClient("REDACTED");
var containerClient = BlobServiceClient.GetBlobContainerClient("base-test");
BlobClient blobClient = containerClient.GetBlobClient($"Base/{RequestTime.Year}/{RequestTime.Month}/{RequestTime.Day}/{RequestTime.Hour}/{RequestTime.Minute}/events.parquet");
using FileStream uploadFileStream = File.OpenRead("..\..\..\events.parquet");
await blobClient.UploadAsync(uploadFileStream, true);
uploadFileStream.Close();
我需要在内存中创建它然后上传到 Azure blob 存储。我怎样才能做到这一点?澄清一下:我需要上传镶木地板文件。
关于这个问题,您可以使用方法BlockBlobClient.OpenWriteAsync
获取流并为ChoParquetWriter
提供流。然后writer会直接把东西写到Azure blob中。
例如
List<EmployeeRecSimple> objs = new List<EmployeeRecSimple>();
EmployeeRecSimple rec1 = new EmployeeRecSimple();
rec1.Id = 1;
rec1.Name = "Mark";
objs.Add(rec1);
EmployeeRecSimple rec2 = new EmployeeRecSimple();
rec2.Id = 2;
rec2.Name = "Jason";
objs.Add(rec2);
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
var desContainer = blobServiceClient.GetBlobContainerClient("output");
var desBlob= desContainer.GetBlockBlobClient("my.parquet");
var options = new BlockBlobOpenWriteOptions {
HttpHeaders = new BlobHttpHeaders {
ContentType = MimeMapping.GetMimeMapping("parquet"),
},
// progress updates about data transfers
ProgressHandler = new Progress<long> (
progress => Console.WriteLine("Progress: {0} bytes written", progress))
};
using (var outStream = await desBlob.OpenWriteAsync(true, options).ConfigureAwait(false))
using (ChoParquetWriter parser = new ChoParquetWriter(outStream)) {
parser.Write(objs);
}
public partial class EmployeeRecSimple
{
public int Id { get; set; }
public string Name { get; set; }
}