当我从 AWS 迁移到 Azure DataLake 时如何避免存储此文件?
How to avoid storing this file when i move from AWS to Azure DataLake?
我正在编写一个 Azure 函数,将文件从 AWS S3 移动到 Azure Datalake,我可以下载和上传,但我很难将两者拼凑在一起,因为我不想存储中间应用程序中的文件可以这么说,因为 azure 函数本身不需要存储它,只需传递它即可。
解释起来不是那么容易,所以在我尝试解释我想做的事情时,请稍等一下。
当我使用此代码从 S3 下载时
await client.GetObjectAsync(new GetObjectRequest { BucketName = bucketName, Key = entry.Key });
我没有文件系统来存储它,我不想存储它,我希望它作为某种 "object" 可以直接传递给 Azure Data Lake Writer看起来像这样
adlsFileSystemClient.FileSystem.UploadFile(adlsAccountName, source, destination, 1, false, true);
如果我将它下载到我的本地磁盘然后上传它,代码工作正常,但这不是我想要的,因为 azure 函数没有存储我想将下载的对象直接传递给上传者所以说
我怎样才能做到这一点?
**** 编辑 ****
// Process the response.
foreach (S3Object entry in response.S3Objects)
{
Console.WriteLine("key = {0} size = {1}", entry.Key.Split('/').Last(), entry.Size);
string fileNameOnly = entry.Key.Split('/').Last();
//await client.GetObjectAsync(new GetObjectRequest { BucketName = bucketName, Key = entry.Key });
GetObjectResponse getObjRespone = await client.GetObjectAsync(bucketName, entry.Key);
MemoryStream stream = new MemoryStream();
getObjRespone.ResponseStream.CopyTo(stream);
if (entry.Key.Contains("MerchandiseHierarchy") == true)
{
WriteToAzureDataLake(stream, @"/PIMRAW/MerchandiseHierarchy/" + fileNameOnly);
}
}
然后我将内存流传递给 azure 方法,但我需要一个 streamuploader,但我找不到它,以下抱怨它无法将流转换为字符串
adlsFileSystemClient.FileSystem.UploadFile(adlsAccountName, source, destination, 1, false, true);
* EDIT2 *
按如下方式更改上传方法,它会在目标位置创建文件,但大小为 0,所以我想知道我是否在下载完成之前创建?
static void WriteToAzureDataLake(MemoryStream inputSource, string inputDestination)
{
// 1. Set Synchronization Context
SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
// 2. Create credentials to authenticate requests as an Active Directory application
var clientCredential = new ClientCredential(clientId, clientSecret);
var creds = ApplicationTokenProvider.LoginSilentAsync(tenantId, clientCredential).Result;
// 2. Initialise Data Lake Store File System Client
adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(creds);
// 3. Upload a file to the Data Lake Store
//var source = @"c:\nwsys\source.txt";
var source = inputSource;
//var destination = "/PIMRAW/MerchandiseHierarchy/destination.txt";
var destination = inputDestination;
//adlsFileSystemClient.FileSystem.UploadFile(adlsAccountName, source, destination, 1, false, true);
adlsFileSystemClient.FileSystem.Create(adlsAccountName, destination, source);
// FINISHED
Console.WriteLine("6. Finished!");
}
Change the upload method as follows and it creates the file at destination but with 0 size
似乎需要在写入数据湖之前将流位置设置为0。
stream.Position = 0;
我正在编写一个 Azure 函数,将文件从 AWS S3 移动到 Azure Datalake,我可以下载和上传,但我很难将两者拼凑在一起,因为我不想存储中间应用程序中的文件可以这么说,因为 azure 函数本身不需要存储它,只需传递它即可。
解释起来不是那么容易,所以在我尝试解释我想做的事情时,请稍等一下。
当我使用此代码从 S3 下载时
await client.GetObjectAsync(new GetObjectRequest { BucketName = bucketName, Key = entry.Key });
我没有文件系统来存储它,我不想存储它,我希望它作为某种 "object" 可以直接传递给 Azure Data Lake Writer看起来像这样
adlsFileSystemClient.FileSystem.UploadFile(adlsAccountName, source, destination, 1, false, true);
如果我将它下载到我的本地磁盘然后上传它,代码工作正常,但这不是我想要的,因为 azure 函数没有存储我想将下载的对象直接传递给上传者所以说
我怎样才能做到这一点?
**** 编辑 ****
// Process the response.
foreach (S3Object entry in response.S3Objects)
{
Console.WriteLine("key = {0} size = {1}", entry.Key.Split('/').Last(), entry.Size);
string fileNameOnly = entry.Key.Split('/').Last();
//await client.GetObjectAsync(new GetObjectRequest { BucketName = bucketName, Key = entry.Key });
GetObjectResponse getObjRespone = await client.GetObjectAsync(bucketName, entry.Key);
MemoryStream stream = new MemoryStream();
getObjRespone.ResponseStream.CopyTo(stream);
if (entry.Key.Contains("MerchandiseHierarchy") == true)
{
WriteToAzureDataLake(stream, @"/PIMRAW/MerchandiseHierarchy/" + fileNameOnly);
}
}
然后我将内存流传递给 azure 方法,但我需要一个 streamuploader,但我找不到它,以下抱怨它无法将流转换为字符串
adlsFileSystemClient.FileSystem.UploadFile(adlsAccountName, source, destination, 1, false, true);
* EDIT2 *
按如下方式更改上传方法,它会在目标位置创建文件,但大小为 0,所以我想知道我是否在下载完成之前创建?
static void WriteToAzureDataLake(MemoryStream inputSource, string inputDestination)
{
// 1. Set Synchronization Context
SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
// 2. Create credentials to authenticate requests as an Active Directory application
var clientCredential = new ClientCredential(clientId, clientSecret);
var creds = ApplicationTokenProvider.LoginSilentAsync(tenantId, clientCredential).Result;
// 2. Initialise Data Lake Store File System Client
adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(creds);
// 3. Upload a file to the Data Lake Store
//var source = @"c:\nwsys\source.txt";
var source = inputSource;
//var destination = "/PIMRAW/MerchandiseHierarchy/destination.txt";
var destination = inputDestination;
//adlsFileSystemClient.FileSystem.UploadFile(adlsAccountName, source, destination, 1, false, true);
adlsFileSystemClient.FileSystem.Create(adlsAccountName, destination, source);
// FINISHED
Console.WriteLine("6. Finished!");
}
Change the upload method as follows and it creates the file at destination but with 0 size
似乎需要在写入数据湖之前将流位置设置为0。
stream.Position = 0;