运行 U-SQL 来自 C# 代码和 Azure 数据工厂的脚本
Run U-SQL Script from C# code with Azure Data Factory
我正在尝试通过 C# 代码在 Azure 上 运行 一个 U-SQL 脚本。执行代码后,一切都在 Azure 上创建(ADF、链接服务、管道、数据集),但 ADF 不执行 U-SQl 脚本。我认为管道代码中配置的开始时间和结束时间存在问题。
我按照下面的文章完成了这个控制台应用程序。
Create, monitor, and manage Azure data factories using Data Factory .NET SDK
这是我的完整 C# 代码项目的 URL 供下载。
https://1drv.ms/u/s!AltdTyVEmoG2ijOupx-EjCM-8Zk4
谁能帮我找出错误
配置管道的 C# 代码:
DateTime PipelineActivePeriodStartTime = new DateTime(2017, 1, 12, 0, 0, 0, 0, DateTimeKind.Utc);
DateTime PipelineActivePeriodEndTime = PipelineActivePeriodStartTime.AddMinutes(60);
字符串管道名称 = "ComputeEventsByRegionPipeline";
var usqlparams = new Dictionary<string, string>();
usqlparams.Add("in", "/Samples/Data/SearchLog.tsv");
usqlparams.Add("out", "/Output/testdemo1.tsv");
client.Pipelines.CreateOrUpdate(resourceGroupName, dataFactoryName,
new PipelineCreateOrUpdateParameters()
{
Pipeline = new Pipeline()
{
Name = PipelineName,
Properties = new PipelineProperties()
{
Description = "This is a demo pipe line.",
// Initial value for pipeline's active period. With this, you won't need to set slice status
Start = PipelineActivePeriodStartTime,
End = PipelineActivePeriodEndTime,
IsPaused = false,
Activities = new List<Activity>()
{
new Activity()
{
TypeProperties = new DataLakeAnalyticsUSQLActivity("@searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string, Duration int?, Urls string, ClickedUrls string FROM @in USING Extractors.Tsv(nullEscape:\"#NULL#\"); @rs1 = SELECT Start, Region, Duration FROM @searchlog; OUTPUT @rs1 TO @out USING Outputters.Tsv(quoting:false);")
{
DegreeOfParallelism = 3,
Priority = 100,
Parameters = usqlparams
},
Inputs = new List<ActivityInput>()
{
new ActivityInput(Dataset_Source)
},
Outputs = new List<ActivityOutput>()
{
new ActivityOutput(Dataset_Destination)
},
Policy = new ActivityPolicy()
{
Timeout = new TimeSpan(6,0,0),
Concurrency = 1,
ExecutionPriorityOrder = ExecutionPriorityOrder.NewestFirst,
Retry = 1
},
Scheduler = new Scheduler()
{
Frequency = "Day",
Interval = 1
},
Name = "EventsByRegion",
LinkedServiceName = "AzureDataLakeAnalyticsLinkedService"
}
}
}
}
});
我刚刚注意到 Azure 数据工厂视图(监视和管理选项)中的某些内容。 Pipeline 的状态是 Waiting : DatasetDependencies。 我需要为此修改代码吗?
如果您没有另一个 activity 正在创建您的源数据集,您需要向其添加属性
"external": true
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-faq
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets
我正在尝试通过 C# 代码在 Azure 上 运行 一个 U-SQL 脚本。执行代码后,一切都在 Azure 上创建(ADF、链接服务、管道、数据集),但 ADF 不执行 U-SQl 脚本。我认为管道代码中配置的开始时间和结束时间存在问题。
我按照下面的文章完成了这个控制台应用程序。 Create, monitor, and manage Azure data factories using Data Factory .NET SDK
这是我的完整 C# 代码项目的 URL 供下载。 https://1drv.ms/u/s!AltdTyVEmoG2ijOupx-EjCM-8Zk4
谁能帮我找出错误
配置管道的 C# 代码:
DateTime PipelineActivePeriodStartTime = new DateTime(2017, 1, 12, 0, 0, 0, 0, DateTimeKind.Utc); DateTime PipelineActivePeriodEndTime = PipelineActivePeriodStartTime.AddMinutes(60); 字符串管道名称 = "ComputeEventsByRegionPipeline";
var usqlparams = new Dictionary<string, string>();
usqlparams.Add("in", "/Samples/Data/SearchLog.tsv");
usqlparams.Add("out", "/Output/testdemo1.tsv");
client.Pipelines.CreateOrUpdate(resourceGroupName, dataFactoryName,
new PipelineCreateOrUpdateParameters()
{
Pipeline = new Pipeline()
{
Name = PipelineName,
Properties = new PipelineProperties()
{
Description = "This is a demo pipe line.",
// Initial value for pipeline's active period. With this, you won't need to set slice status
Start = PipelineActivePeriodStartTime,
End = PipelineActivePeriodEndTime,
IsPaused = false,
Activities = new List<Activity>()
{
new Activity()
{
TypeProperties = new DataLakeAnalyticsUSQLActivity("@searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string, Duration int?, Urls string, ClickedUrls string FROM @in USING Extractors.Tsv(nullEscape:\"#NULL#\"); @rs1 = SELECT Start, Region, Duration FROM @searchlog; OUTPUT @rs1 TO @out USING Outputters.Tsv(quoting:false);")
{
DegreeOfParallelism = 3,
Priority = 100,
Parameters = usqlparams
},
Inputs = new List<ActivityInput>()
{
new ActivityInput(Dataset_Source)
},
Outputs = new List<ActivityOutput>()
{
new ActivityOutput(Dataset_Destination)
},
Policy = new ActivityPolicy()
{
Timeout = new TimeSpan(6,0,0),
Concurrency = 1,
ExecutionPriorityOrder = ExecutionPriorityOrder.NewestFirst,
Retry = 1
},
Scheduler = new Scheduler()
{
Frequency = "Day",
Interval = 1
},
Name = "EventsByRegion",
LinkedServiceName = "AzureDataLakeAnalyticsLinkedService"
}
}
}
}
});
我刚刚注意到 Azure 数据工厂视图(监视和管理选项)中的某些内容。 Pipeline 的状态是 Waiting : DatasetDependencies。
如果您没有另一个 activity 正在创建您的源数据集,您需要向其添加属性
"external": true
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-faq
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets