'在加载 .CSV 文件之前删除四引号的 C# 脚本任务"
'C# script task to remove quadruple quotes before loading .CSV file"
我有一个相当基本的 SSIS 包,可以将 .csv 文件加载到 SQL table。但是,当程序包尝试读取数据流任务中的 .csv 源时,我收到错误消息:"The column delimiter for column 'X' was not found. An error occurred while processing file "file.csv" on data row 'Y'."
在这种情况下,发生的情况是数千行中有几行包含四引号内的字符串,即 "Jane "Jill" Doe."在 UltraEdit 中手动删除这些行中的引号是可行的,但是,我正在尝试自动化这些包。派生列不起作用,因为它是分隔符的问题。
原来我需要一个脚本任务来删除四引号,然后包才能正确加载文件。下面的代码(我从各种来源拼凑而成)被 SSIS 接受为没有错误,但在执行时遇到 DTS 脚本任务运行时错误:
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
#endregion
namespace ST_a881d570d1a6495e84824a72bd28f44f
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
// TODO: Add your code here
var fileContents = System.IO.File.ReadAllText(@"C:\File.csv");
fileContents = fileContents.Replace("<body>", "<body onload='jsFx();' />");
fileContents = fileContents.Replace("</body>", "</body>");
System.IO.File.WriteAllText(@"C:\File.csv", fileContents);
}
#region ScriptResults declaration
/// <summary>
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
/// </summary>
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
我的替代脚本是:
{
string filepath = (string)Dts.Variables[@C:\"File.csv"].Value;
var fileContents = System.IO.File.ReadAllText(filepath);
fileContents = fileContents.Replace("\"\"", "");
System.IO.File.WriteAllText(@C:\"File.csv", fileContents);
}
我做错了什么?
以下 C# 示例将搜索 csv 文件,删除双引号文本中包含的所有双引号,然后将修改后的内容写回到文件中。正则表达式 returns 匹配任何不在字符串开头或结尾的双引号,或者直接没有逗号 before/after 它并用空字符串替换双引号。您可能已经这样做了,但请确保保存文件路径的变量列在脚本任务的 ReadOnlyVariables
字段中。
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
string filePath = Dts.Variables["User::FilePath"].Value.ToString();
List<String> outputRecords = new List<String>();
if (File.Exists(filePath))
{
using (StreamReader rdr = new StreamReader(filePath))
{
string line;
while ((line = rdr.ReadLine()) != null)
{
if (line.Contains(","))
{
string[] split = line.Split(',');
//replace double qoutes between text
line = Regex.Replace(line, "(?<!(,|^))\"(?!($|,))", x => x.Value.Replace("\"", ""));
}
outputRecords.Add(line);
}
}
using (StreamWriter sw = new StreamWriter(filePath, false))
{
//write filtered records back to file
foreach (string s in outputRecords)
sw.WriteLine(s);
}
}
我有一个相当基本的 SSIS 包,可以将 .csv 文件加载到 SQL table。但是,当程序包尝试读取数据流任务中的 .csv 源时,我收到错误消息:"The column delimiter for column 'X' was not found. An error occurred while processing file "file.csv" on data row 'Y'."
在这种情况下,发生的情况是数千行中有几行包含四引号内的字符串,即 "Jane "Jill" Doe."在 UltraEdit 中手动删除这些行中的引号是可行的,但是,我正在尝试自动化这些包。派生列不起作用,因为它是分隔符的问题。
原来我需要一个脚本任务来删除四引号,然后包才能正确加载文件。下面的代码(我从各种来源拼凑而成)被 SSIS 接受为没有错误,但在执行时遇到 DTS 脚本任务运行时错误:
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
#endregion
namespace ST_a881d570d1a6495e84824a72bd28f44f
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
// TODO: Add your code here
var fileContents = System.IO.File.ReadAllText(@"C:\File.csv");
fileContents = fileContents.Replace("<body>", "<body onload='jsFx();' />");
fileContents = fileContents.Replace("</body>", "</body>");
System.IO.File.WriteAllText(@"C:\File.csv", fileContents);
}
#region ScriptResults declaration
/// <summary>
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
/// </summary>
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
我的替代脚本是:
{
string filepath = (string)Dts.Variables[@C:\"File.csv"].Value;
var fileContents = System.IO.File.ReadAllText(filepath);
fileContents = fileContents.Replace("\"\"", "");
System.IO.File.WriteAllText(@C:\"File.csv", fileContents);
}
我做错了什么?
以下 C# 示例将搜索 csv 文件,删除双引号文本中包含的所有双引号,然后将修改后的内容写回到文件中。正则表达式 returns 匹配任何不在字符串开头或结尾的双引号,或者直接没有逗号 before/after 它并用空字符串替换双引号。您可能已经这样做了,但请确保保存文件路径的变量列在脚本任务的 ReadOnlyVariables
字段中。
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
string filePath = Dts.Variables["User::FilePath"].Value.ToString();
List<String> outputRecords = new List<String>();
if (File.Exists(filePath))
{
using (StreamReader rdr = new StreamReader(filePath))
{
string line;
while ((line = rdr.ReadLine()) != null)
{
if (line.Contains(","))
{
string[] split = line.Split(',');
//replace double qoutes between text
line = Regex.Replace(line, "(?<!(,|^))\"(?!($|,))", x => x.Value.Replace("\"", ""));
}
outputRecords.Add(line);
}
}
using (StreamWriter sw = new StreamWriter(filePath, false))
{
//write filtered records back to file
foreach (string s in outputRecords)
sw.WriteLine(s);
}
}