读取和复制大 files/blobs 而不将它们存储在 C# 中的内存流中
Reading and copying large files/blobs without storing them in memory stream in C#
下面是从我的 blob 存储中读取 blob 然后将内容复制到表格存储中的代码。现在一切正常。但我知道,如果我的文件太大,那么读取和复制将失败。我想知道我们如何理想地处理这个问题,是我们临时写入文件而不是将其存储在内存中吗?如果是,有人可以给我示例或告诉我如何在下面的现有代码中执行此操作 >
public async Task<Stream> ReadStream(string containerName, string digestFileName, string fileName, string connectionString)
{
string data = string.Empty;
string fileExtension = Path.GetExtension(fileName);
var contents = await DownloadBlob(containerName, digestFileName, connectionString);
return contents;
}
public async Task<Stream> DownloadBlob(string containerName, string fileName, string connectionString)
{
Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
if (!blob.Exists())
{
throw new Exception($"Unable to upload data in table store for document");
}
return await blob.OpenReadAsync();
}
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
var map = (
from col in cols
from src in col.Sources()
let index = csv.GetFieldIndex(src, isTryGet: true)
where index != -1
select new { col.Name, Index = index, Type = col.DataType }).ToList();
while (csv.Read())
{
yield return map.ToDictionary(
col => col.Name,
col => EntityProperty.CreateEntityPropertyFromObject(csv.GetField(col.Type, col.Index)));
}
}
}
我想它可能看起来像这样(修改您的 ReadCSV 以获取流,而不是行):
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
using (TextReader reader = new StreamReader(source))
还有这个(将您的 DownloadBlob 修改为 return 流):
public async Task<Stream> GetBlobStream(string containerName, string fileName, string connectionString)
{
Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
if (!blob.Exists())
{
throw ...
}
return await blob.OpenReadAsync();
}
然后将它们连接在一起:
var stream = GetBlobStream(...)
ReadCSV(stream, ...)
由于您坚持认为 CsvHelper 无法从连接到 blob 的流中读取数据,我将一些东西放在一起:
- WinForms 核心应用 (3.1)
- CsvHelper 最新 (19)
- Azure.Storage.Blobs (12.8)
来自我磁盘的 CSV:
在我的 blob 存储上:
在我的调试器中,它有记录 CAf255 OK by Read/GetRecord:
或通过 EnumerateRecords:
使用此代码:
private async void button1_Click(object sender, EventArgs e)
{
var cstr = "MY CONNECTION STRING HERE";
var bbc = new BlockBlobClient(cstr, "temp", "call.csv");
var s = await bbc.OpenReadAsync(new BlobOpenReadOptions(true) { BufferSize = 16384 });
var sr = new StreamReader(s);
var csv = new CsvHelper.CsvReader(sr, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true });
var x = new X();
//try by read/getrecord (breakpoint and skip over it if you want to try the other way)
while(await csv.ReadAsync())
{
var rec = csv.GetRecord<X>();
Console.WriteLine(rec.Sid);
}
//try by await foreach
await foreach (var r in csv.EnumerateRecordsAsync(x))
{
Console.WriteLine(r.Sid);
}
}
哦,还有 class 在我的应用程序中代表一个 CSV 记录(我只建模了一个 属性,Sid,以证明这个概念):
class X {
public string Sid{ get; set; }
}
也许把事情拨回去,从简单的开始。 CSV 中的一个字符串道具,没有屈服等,只需读取文件即可。我也没有理会所有的 header 花言巧语——似乎只要在选项中说“文件有 headers”就可以正常工作——你可以看到我的调试器有一个 X 的实例,正确地填充的 Sid 属性 显示第一个值。我 运行 一些循环,它们也填充正常
下面是从我的 blob 存储中读取 blob 然后将内容复制到表格存储中的代码。现在一切正常。但我知道,如果我的文件太大,那么读取和复制将失败。我想知道我们如何理想地处理这个问题,是我们临时写入文件而不是将其存储在内存中吗?如果是,有人可以给我示例或告诉我如何在下面的现有代码中执行此操作 >
public async Task<Stream> ReadStream(string containerName, string digestFileName, string fileName, string connectionString)
{
string data = string.Empty;
string fileExtension = Path.GetExtension(fileName);
var contents = await DownloadBlob(containerName, digestFileName, connectionString);
return contents;
}
public async Task<Stream> DownloadBlob(string containerName, string fileName, string connectionString)
{
Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
if (!blob.Exists())
{
throw new Exception($"Unable to upload data in table store for document");
}
return await blob.OpenReadAsync();
}
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
var map = (
from col in cols
from src in col.Sources()
let index = csv.GetFieldIndex(src, isTryGet: true)
where index != -1
select new { col.Name, Index = index, Type = col.DataType }).ToList();
while (csv.Read())
{
yield return map.ToDictionary(
col => col.Name,
col => EntityProperty.CreateEntityPropertyFromObject(csv.GetField(col.Type, col.Index)));
}
}
}
我想它可能看起来像这样(修改您的 ReadCSV 以获取流,而不是行):
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
using (TextReader reader = new StreamReader(source))
还有这个(将您的 DownloadBlob 修改为 return 流):
public async Task<Stream> GetBlobStream(string containerName, string fileName, string connectionString)
{
Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
if (!blob.Exists())
{
throw ...
}
return await blob.OpenReadAsync();
}
然后将它们连接在一起:
var stream = GetBlobStream(...)
ReadCSV(stream, ...)
由于您坚持认为 CsvHelper 无法从连接到 blob 的流中读取数据,我将一些东西放在一起:
- WinForms 核心应用 (3.1)
- CsvHelper 最新 (19)
- Azure.Storage.Blobs (12.8)
来自我磁盘的 CSV:
在我的 blob 存储上:
在我的调试器中,它有记录 CAf255 OK by Read/GetRecord:
或通过 EnumerateRecords:
使用此代码:
private async void button1_Click(object sender, EventArgs e)
{
var cstr = "MY CONNECTION STRING HERE";
var bbc = new BlockBlobClient(cstr, "temp", "call.csv");
var s = await bbc.OpenReadAsync(new BlobOpenReadOptions(true) { BufferSize = 16384 });
var sr = new StreamReader(s);
var csv = new CsvHelper.CsvReader(sr, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true });
var x = new X();
//try by read/getrecord (breakpoint and skip over it if you want to try the other way)
while(await csv.ReadAsync())
{
var rec = csv.GetRecord<X>();
Console.WriteLine(rec.Sid);
}
//try by await foreach
await foreach (var r in csv.EnumerateRecordsAsync(x))
{
Console.WriteLine(r.Sid);
}
}
哦,还有 class 在我的应用程序中代表一个 CSV 记录(我只建模了一个 属性,Sid,以证明这个概念):
class X {
public string Sid{ get; set; }
}
也许把事情拨回去,从简单的开始。 CSV 中的一个字符串道具,没有屈服等,只需读取文件即可。我也没有理会所有的 header 花言巧语——似乎只要在选项中说“文件有 headers”就可以正常工作——你可以看到我的调试器有一个 X 的实例,正确地填充的 Sid 属性 显示第一个值。我 运行 一些循环,它们也填充正常