运行 U-SQL 来自 C# 代码和 Azure 数据工厂的脚本

Run U-SQL Script from C# code with Azure Data Factory

我正在尝试通过 C# 代码在 Azure 上 运行 一个 U-SQL 脚本。执行代码后,一切都在 Azure 上创建(ADF、链接服务、管道、数据集),但 ADF 不执行 U-SQl 脚本。我认为管道代码中配置的开始时间和结束时间存在问题。

我按照下面的文章完成了这个控制台应用程序。 Create, monitor, and manage Azure data factories using Data Factory .NET SDK

这是我的完整 C# 代码项目的 URL 供下载。 https://1drv.ms/u/s!AltdTyVEmoG2ijOupx-EjCM-8Zk4

谁能帮我找出错误

配置管道的 C# 代码:

DateTime PipelineActivePeriodStartTime = new DateTime(2017, 1, 12, 0, 0, 0, 0, DateTimeKind.Utc); DateTime PipelineActivePeriodEndTime = PipelineActivePeriodStartTime.AddMinutes(60); 字符串管道名称 = "ComputeEventsByRegionPipeline";

        var usqlparams = new Dictionary<string, string>();
        usqlparams.Add("in", "/Samples/Data/SearchLog.tsv");
        usqlparams.Add("out", "/Output/testdemo1.tsv");

        client.Pipelines.CreateOrUpdate(resourceGroupName, dataFactoryName,
        new PipelineCreateOrUpdateParameters()
        {
            Pipeline = new Pipeline()
            {
                Name = PipelineName,
                Properties = new PipelineProperties()
                {
                    Description = "This is a demo pipe line.",

                    // Initial value for pipeline's active period. With this, you won't need to set slice status
                    Start = PipelineActivePeriodStartTime,
                    End = PipelineActivePeriodEndTime,
                    IsPaused = false,

                    Activities = new List<Activity>()
                    {
                        new Activity()
                        {
                            TypeProperties = new DataLakeAnalyticsUSQLActivity("@searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string, Duration int?, Urls string, ClickedUrls string FROM @in USING Extractors.Tsv(nullEscape:\"#NULL#\"); @rs1 = SELECT Start, Region, Duration FROM @searchlog; OUTPUT @rs1 TO @out USING Outputters.Tsv(quoting:false);")
                            {
                                DegreeOfParallelism = 3,
                                Priority = 100,
                                Parameters = usqlparams
                            },
                            Inputs = new List<ActivityInput>()
                            {
                                new ActivityInput(Dataset_Source)
                            },
                            Outputs = new List<ActivityOutput>()
                            {
                                new ActivityOutput(Dataset_Destination)
                            },
                            Policy = new ActivityPolicy()
                            {
                                Timeout = new TimeSpan(6,0,0),
                                Concurrency = 1,
                                ExecutionPriorityOrder = ExecutionPriorityOrder.NewestFirst,
                                Retry = 1
                            },
                            Scheduler = new Scheduler()
                            {
                                Frequency = "Day",
                                Interval = 1
                            },
                            Name = "EventsByRegion",
                            LinkedServiceName = "AzureDataLakeAnalyticsLinkedService"
                        }
                    }
                }
            }
        });

我刚刚注意到 Azure 数据工厂视图(监视和管理选项)中的某些内容。 Pipeline 的状态是 Waiting : DatasetDependencies 我需要为此修改代码吗?

如果您没有另一个 activity 正在创建您的源数据集,您需要向其添加属性

"external": true

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-faq

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets