Azure 数据工厂 - 复制 Activity 空值不会更改为空错误
Azure Data Factory - Copy Activity empty value doesn't change to null error
我有一个带有 txt 文件的天蓝色 blob。有些列有空值,所以当它们被保存到数据库 table 时,它们是 NULL。我可以让它与直接 SQL 和 SSIS ETL 包一起使用。
行示例:
1002,100,Butter,whipped with salt BUTTER,WHIPPED W SALT,Y,0,6.38,
最后三个假设为空。
当我尝试使用 ADF 时出现此错误:
Copy activity encountered a user error:
ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column
'CarbohydratesFactor' contains an invalid value ' '. Cannot convert '
' to type
'Decimal'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=Input
string was not in a correct format.,Source=mscorlib,'.
FoodDescriptionsAzureBlob:
{
"name": "FoodDescriptionsAzureBlob",
"properties": {
"structure": [
{
"name": "NutrientDatabankNumber",
"type": "Int32"
},
{
"name": "FoodGroupCode",
"type": "Int32"
},
{
"name": "LongDescription",
"type": "String"
},
{
"name": "ShortDescription",
"type": "String"
},
{
"name": "CommonName",
"type": "String"
},
{
"name": "ManufacturerName",
"type": "String"
},
{
"name": "Survey",
"type": "String"
},
{
"name": "ReferenceDescription",
"type": "String"
},
{
"name": "RefusePercentage",
"type": "Int32"
},
{
"name": "ScientificName",
"type": "String"
},
{
"name": "NitrogenFactor",
"type": "Decimal"
},
{
"name": "ProteinFactor",
"type": "Decimal"
},
{
"name": "FatFactor",
"type": "Decimal"
},
{
"name": "CarbohydratesFactor",
"type": "Decimal"
}
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "AzureStorageLinkedService",
"typeProperties": {
"fileName": "FOOD_DES.txt",
"folderPath": "gym-nutrition-data/NutrientData/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "^",
"nullValue": "",
"quoteChar": "~"
}
},
"availability": {
"frequency": "Minute",
"interval": 15
},
"external": true,
"policy": {}
}
}
食品描述SQL蔚蓝:
{
"name": "FoodDescriptionsSQLAzure",
"properties": {
"structure": [
{
"name": "NutrientDatabankNumber",
"type": "Int32"
},
{
"name": "FoodGroupCode",
"type": "Int32"
},
{
"name": "LongDescription",
"type": "String"
},
{
"name": "ShortDescription",
"type": "String"
},
{
"name": "CommonName",
"type": "String"
},
{
"name": "ManufacturerName",
"type": "String"
},
{
"name": "Survey",
"type": "String"
},
{
"name": "ReferenceDescription",
"type": "String"
},
{
"name": "RefusePercentage",
"type": "Int32"
},
{
"name": "ScientificName",
"type": "String"
},
{
"name": "NitrogenFactor",
"type": "Decimal"
},
{
"name": "ProteinFactor",
"type": "Decimal"
},
{
"name": "FatFactor",
"type": "Decimal"
},
{
"name": "CarbohydratesFactor",
"type": "Decimal"
}
],
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "AzureSqlLinkedService",
"typeProperties": {
"tableName": "FoodDescriptions"
},
"availability": {
"frequency": "Minute",
"interval": 15
}
}
}
管道:
{
"name": "NutrientDataBlobToAzureSqlPipeline",
"properties": {
"description": "Copy nutrient data from Azure BLOB to Azure SQL",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"treatEmptyAsNull": true
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodGroupDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodGroupDescriptionsSQLAzure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodGroupDescriptions",
"description": "#1 Bulk Import FoodGroupDescriptions"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"treatEmptyAsNull": true
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodDescriptionsSQLAzure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodDescriptions",
"description": "#2 Bulk Import FoodDescriptions"
}
],
"start": "2015-07-14T00:00:00Z",
"end": "2015-07-14T00:00:00Z",
"isPaused": false,
"hubName": "gymappdatafactory_hub",
"pipelineMode": "Scheduled"
}
}
我尝试设置 "treatEmptyAsNull":在管道中为真,但运气不佳。
我不得不从 blob 数据集中删除 "rowDelimiter": "\n",
。
大多数源文本文件都有行分隔符“\r\n”。使用行分隔符设置“\n”,最后一列的数据值“\r”不是空字符串,不会被视为空值。如果没有行分隔符设置,ADF Copy 将默认使用行分隔符“\r\n”,最后一列将为空字符串并可被视为 null。
我有一个带有 txt 文件的天蓝色 blob。有些列有空值,所以当它们被保存到数据库 table 时,它们是 NULL。我可以让它与直接 SQL 和 SSIS ETL 包一起使用。
行示例:
1002,100,Butter,whipped with salt BUTTER,WHIPPED W SALT,Y,0,6.38,
最后三个假设为空。
当我尝试使用 ADF 时出现此错误:
Copy activity encountered a user error: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'CarbohydratesFactor' contains an invalid value ' '. Cannot convert ' ' to type 'Decimal'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=Input string was not in a correct format.,Source=mscorlib,'.
FoodDescriptionsAzureBlob:
{
"name": "FoodDescriptionsAzureBlob",
"properties": {
"structure": [
{
"name": "NutrientDatabankNumber",
"type": "Int32"
},
{
"name": "FoodGroupCode",
"type": "Int32"
},
{
"name": "LongDescription",
"type": "String"
},
{
"name": "ShortDescription",
"type": "String"
},
{
"name": "CommonName",
"type": "String"
},
{
"name": "ManufacturerName",
"type": "String"
},
{
"name": "Survey",
"type": "String"
},
{
"name": "ReferenceDescription",
"type": "String"
},
{
"name": "RefusePercentage",
"type": "Int32"
},
{
"name": "ScientificName",
"type": "String"
},
{
"name": "NitrogenFactor",
"type": "Decimal"
},
{
"name": "ProteinFactor",
"type": "Decimal"
},
{
"name": "FatFactor",
"type": "Decimal"
},
{
"name": "CarbohydratesFactor",
"type": "Decimal"
}
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "AzureStorageLinkedService",
"typeProperties": {
"fileName": "FOOD_DES.txt",
"folderPath": "gym-nutrition-data/NutrientData/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "^",
"nullValue": "",
"quoteChar": "~"
}
},
"availability": {
"frequency": "Minute",
"interval": 15
},
"external": true,
"policy": {}
}
}
食品描述SQL蔚蓝:
{
"name": "FoodDescriptionsSQLAzure",
"properties": {
"structure": [
{
"name": "NutrientDatabankNumber",
"type": "Int32"
},
{
"name": "FoodGroupCode",
"type": "Int32"
},
{
"name": "LongDescription",
"type": "String"
},
{
"name": "ShortDescription",
"type": "String"
},
{
"name": "CommonName",
"type": "String"
},
{
"name": "ManufacturerName",
"type": "String"
},
{
"name": "Survey",
"type": "String"
},
{
"name": "ReferenceDescription",
"type": "String"
},
{
"name": "RefusePercentage",
"type": "Int32"
},
{
"name": "ScientificName",
"type": "String"
},
{
"name": "NitrogenFactor",
"type": "Decimal"
},
{
"name": "ProteinFactor",
"type": "Decimal"
},
{
"name": "FatFactor",
"type": "Decimal"
},
{
"name": "CarbohydratesFactor",
"type": "Decimal"
}
],
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "AzureSqlLinkedService",
"typeProperties": {
"tableName": "FoodDescriptions"
},
"availability": {
"frequency": "Minute",
"interval": 15
}
}
}
管道:
{
"name": "NutrientDataBlobToAzureSqlPipeline",
"properties": {
"description": "Copy nutrient data from Azure BLOB to Azure SQL",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"treatEmptyAsNull": true
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodGroupDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodGroupDescriptionsSQLAzure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodGroupDescriptions",
"description": "#1 Bulk Import FoodGroupDescriptions"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"treatEmptyAsNull": true
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodDescriptionsSQLAzure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodDescriptions",
"description": "#2 Bulk Import FoodDescriptions"
}
],
"start": "2015-07-14T00:00:00Z",
"end": "2015-07-14T00:00:00Z",
"isPaused": false,
"hubName": "gymappdatafactory_hub",
"pipelineMode": "Scheduled"
}
}
我尝试设置 "treatEmptyAsNull":在管道中为真,但运气不佳。
我不得不从 blob 数据集中删除 "rowDelimiter": "\n",
。
大多数源文本文件都有行分隔符“\r\n”。使用行分隔符设置“\n”,最后一列的数据值“\r”不是空字符串,不会被视为空值。如果没有行分隔符设置,ADF Copy 将默认使用行分隔符“\r\n”,最后一列将为空字符串并可被视为 null。