读取和复制大 files/blobs 而不将它们存储在 C# 中的内存流中

Reading and copying large files/blobs without storing them in memory stream in C#

下面是从我的 blob 存储中读取 blob 然后将内容复制到表格存储中的代码。现在一切正常。但我知道,如果我的文件太大,那么读取和复制将失败。我想知道我们如何理想地处理这个问题,是我们临时写入文件而不是将其存储在内存中吗?如果是,有人可以给我示例或告诉我如何在下面的现有代码中执行此操作 >

public async Task<Stream> ReadStream(string containerName, string digestFileName, string fileName, string connectionString)
        {
            string data = string.Empty;
            string fileExtension = Path.GetExtension(fileName);
            var contents = await DownloadBlob(containerName, digestFileName, connectionString);
                           
            return contents;
        }

    public async Task<Stream> DownloadBlob(string containerName, string fileName, string connectionString)
    {        

       Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
        CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
        CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
        CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
        if (!blob.Exists())
        {
            throw new Exception($"Unable to upload data in table store for document");
        }
       
        return await blob.OpenReadAsync();  
}

     private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
    {
        
            using (TextReader reader = new StreamReader(source, Encoding.UTF8))
            {
            
                var cache = new TypeConverterCache();
                cache.AddConverter<float>(new CSVSingleConverter());
                cache.AddConverter<double>(new CSVDoubleConverter());
                var csv = new CsvReader(reader,
                    new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
                    {
                        Delimiter = ";",
                        HasHeaderRecord = true,
                        CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
                        TypeConverterCache = cache
                    });
                csv.Read();
                csv.ReadHeader();


                var map = (
                        from col in cols
                        from src in col.Sources()
                        let index = csv.GetFieldIndex(src, isTryGet: true)
                        where index != -1
                        select new { col.Name, Index = index, Type = col.DataType }).ToList();

                while (csv.Read())
                {
                    yield return map.ToDictionary(
                        col => col.Name,
                        col => EntityProperty.CreateEntityPropertyFromObject(csv.GetField(col.Type, col.Index)));
                }
            
            }
        
    }

我想它可能看起来像这样(修改您的 ReadCSV 以获取流,而不是行):

private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
    using (TextReader reader = new StreamReader(source))

还有这个(将您的 DownloadBlob 修改为 return 流):

public async Task<Stream> GetBlobStream(string containerName, string fileName, string connectionString)
    {
        
        Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
        CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
        CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
        CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
        if (!blob.Exists())
        {
            throw ...
        }
            
        return await blob.OpenReadAsync();

    }

然后将它们连接在一起:

var stream = GetBlobStream(...)

ReadCSV(stream, ...)

由于您坚持认为 CsvHelper 无法从连接到 blob 的流中读取数据,我将一些东西放在一起:

  • WinForms 核心应用 (3.1)
  • CsvHelper 最新 (19)
  • Azure.Storage.Blobs (12.8)

来自我磁盘的 CSV:

在我的 blob 存储上:

在我的调试器中,它有记录 CAf255 OK by Read/GetRecord:

或通过 EnumerateRecords:

使用此代码:

    private async void button1_Click(object sender, EventArgs e)
    {
        var cstr = "MY CONNECTION STRING HERE";

        var bbc = new BlockBlobClient(cstr, "temp", "call.csv");

        var s = await bbc.OpenReadAsync(new BlobOpenReadOptions(true) { BufferSize = 16384 });

        var sr = new StreamReader(s);

        var csv = new CsvHelper.CsvReader(sr, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true });

        var x = new X();

        //try by read/getrecord (breakpoint and skip over it if you want to try the other way)
        while(await csv.ReadAsync())
        {
            var rec = csv.GetRecord<X>();
            Console.WriteLine(rec.Sid);
        }

        //try by await foreach
        await foreach (var r in csv.EnumerateRecordsAsync(x))
        {
            Console.WriteLine(r.Sid);
        }
    }

哦,还有 class 在我的应用程序中代表一个 CSV 记录(我只建模了一个 属性,Sid,以证明这个概念):

class X {
    public string Sid{ get; set; }
}

也许把事情拨回去,从简单的开始。 CSV 中的一个字符串道具,没有屈服等,只需读取文件即可。我也没有理会所有的 header 花言巧语——似乎只要在选项中说“文件有 headers”就可以正常工作——你可以看到我的调试器有一个 X 的实例,正确地填充的 Sid 属性 显示第一个值。我 运行 一些循环,它们也填充正常