通过 SSIS 仅导入 Excel 的最后一列
Import only Last column of Excel through SSIS
我有一个 excel 文件,我每天都会收到。该文件中的列数不具体。我的要求只是通过 SSIS 加载我的 table 中的最后一列。我将如何动态识别上次使用的列?
不,你不能那样做。列数和数据类型必须事先确定,不能更改。否则 SSIS 将失败。所以没有办法动态获取最后一列。解决方法可能是使用某些宏从 excel 内部获取最后一列,然后将其用作 SSIS 的源。
您可以使用 c# 脚本:
确保添加 Using System.Data.OleDb;到命名空间区域
并添加输出列 LastCol 和 select 数据类型。
public override void CreateNewOutputRows()
{
/*
Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
*/
string fileName = @"C:\test.xlsx";
string SheetName = "Sheet1";
string cstr = "Provider.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=No;IMEX=1\"";
OleDbConnection xlConn = new OleDbConnection(cstr);
xlConn.Open();
OleDbCommand xlCmd = xlConn.CreateCommand();
xlCmd.CommandText = "Select * from [" + SheetName + "]";
xlCmd.CommandType = CommandType.Text;
OleDbDataReader rdr = xlCmd.ExecuteReader();
int rowCt = 0; //Counter
while (rdr.Read())
{
//skip headers
if (rowCt != 0)
{
int maxCol = rdr.FieldCount;
Output0Buffer.AddRow();
Output0Buffer.LastCol = (int)rdr[maxCol];
}
rowCt++; //increment counter
}
}
解决方案概述
使用脚本任务来:
- 获取最后一列索引
使用以下函数将索引转换为列字母 (例如:1 -> A)
Private Function GetExcelColumnName(columnNumber As Integer) As String
Dim dividend As Integer = columnNumber
Dim columnName As String = String.Empty
Dim modulo As Integer
While dividend > 0
modulo = (dividend - 1) Mod 26
columnName = Convert.ToChar(65 + modulo).ToString() & columnName
dividend = CInt((dividend - modulo) / 26)
End While
Return columnName
End Function
构建只读取最后一列的SQL命令
- 选择此查询作为 Excel 来源
详细解决方案
这个答案假设Sheet名字是Sheet1
,使用的编程语言是VB.Net
- 首先创建一个字符串类型的SSIS变量(即@[User::strQuery])
- 添加另一个包含 Excel 文件路径的变量 (即 @[User::ExcelFilePath])
- 添加脚本任务,select
@[User::strQuery]
作为读写变量,@[User::ExcelFilePath]
作为只读变量 (在脚本任务中 window)
- 将脚本语言设置为 VB.Net 并在脚本编辑器中 window 编写以下脚本:
注意:您必须导入 System.Data.OleDb
m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString
Dim strSheetname As String = String.Empty
Dim intLastColumn As Integer = 0
m_strExcelConnectionString = Me.BuildConnectionString()
Try
Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
If OleDBCon.State <> ConnectionState.Open Then
OleDBCon.Open()
End If
'Get all WorkSheets
m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
New Object() {Nothing, Nothing, Nothing, "TABLE"})
'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
For Each schRow As DataRow In m_dtschemaTable.Rows
strSheetname = schRow("TABLE_NAME").ToString
If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "]", OleDBCon)
Dim dtTable As New DataTable("Table1")
cmd.CommandType = CommandType.Text
Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
daGetDataFromSheet.Fill(dtTable)
End Using
'Get the last Column Index
intLastColumn = dtTable.Columns.Count
End Using
'when the first correct sheet is found there is no need to check others
Exit For
End If
Next
OleDBCon.Close()
End Using
Catch ex As Exception
Throw New Exception(ex.Message, ex)
End Try
Dim strColumnname as String = GetExcelColumnName(intLastColumn)
Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & strColumnname & ":" & strColumnname & "]"
Dts.TaskResult = ScriptResults.Success
End Sub
Private Function GetExcelColumnName(columnNumber As Integer) As String
Dim dividend As Integer = columnNumber
Dim columnName As String = String.Empty
Dim modulo As Integer
While dividend > 0
modulo = (dividend - 1) Mod 26
columnName = Convert.ToChar(65 + modulo).ToString() & columnName
dividend = CInt((dividend - modulo) / 26)
End While
Return columnName
End Function
- 然后您必须添加一个 Excel 连接管理器,并选择要导入的 excel 文件 (只是 select 一个示例来定义仅限首次元数据)
- 将默认值
Select * from [Sheet1$]
赋给变量@[User::strQuery]
- 在数据流任务中添加一个Excel源,从变量中选择SQL命令,然后select
@[User::strQuery]
- 将数据流任务
Delay Validation
属性 设置为 True
- 将其他组件添加到数据流任务
参考资料
我有一个 excel 文件,我每天都会收到。该文件中的列数不具体。我的要求只是通过 SSIS 加载我的 table 中的最后一列。我将如何动态识别上次使用的列?
不,你不能那样做。列数和数据类型必须事先确定,不能更改。否则 SSIS 将失败。所以没有办法动态获取最后一列。解决方法可能是使用某些宏从 excel 内部获取最后一列,然后将其用作 SSIS 的源。
您可以使用 c# 脚本:
确保添加 Using System.Data.OleDb;到命名空间区域 并添加输出列 LastCol 和 select 数据类型。
public override void CreateNewOutputRows()
{
/*
Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
*/
string fileName = @"C:\test.xlsx";
string SheetName = "Sheet1";
string cstr = "Provider.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=No;IMEX=1\"";
OleDbConnection xlConn = new OleDbConnection(cstr);
xlConn.Open();
OleDbCommand xlCmd = xlConn.CreateCommand();
xlCmd.CommandText = "Select * from [" + SheetName + "]";
xlCmd.CommandType = CommandType.Text;
OleDbDataReader rdr = xlCmd.ExecuteReader();
int rowCt = 0; //Counter
while (rdr.Read())
{
//skip headers
if (rowCt != 0)
{
int maxCol = rdr.FieldCount;
Output0Buffer.AddRow();
Output0Buffer.LastCol = (int)rdr[maxCol];
}
rowCt++; //increment counter
}
}
解决方案概述
使用脚本任务来:
- 获取最后一列索引
使用以下函数将索引转换为列字母 (例如:1 -> A)
Private Function GetExcelColumnName(columnNumber As Integer) As String Dim dividend As Integer = columnNumber Dim columnName As String = String.Empty Dim modulo As Integer While dividend > 0 modulo = (dividend - 1) Mod 26 columnName = Convert.ToChar(65 + modulo).ToString() & columnName dividend = CInt((dividend - modulo) / 26) End While Return columnName End Function
构建只读取最后一列的SQL命令
- 选择此查询作为 Excel 来源
详细解决方案
这个答案假设Sheet名字是Sheet1
,使用的编程语言是VB.Net
- 首先创建一个字符串类型的SSIS变量(即@[User::strQuery])
- 添加另一个包含 Excel 文件路径的变量 (即 @[User::ExcelFilePath])
- 添加脚本任务,select
@[User::strQuery]
作为读写变量,@[User::ExcelFilePath]
作为只读变量 (在脚本任务中 window) - 将脚本语言设置为 VB.Net 并在脚本编辑器中 window 编写以下脚本:
注意:您必须导入 System.Data.OleDb
m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString
Dim strSheetname As String = String.Empty
Dim intLastColumn As Integer = 0
m_strExcelConnectionString = Me.BuildConnectionString()
Try
Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
If OleDBCon.State <> ConnectionState.Open Then
OleDBCon.Open()
End If
'Get all WorkSheets
m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
New Object() {Nothing, Nothing, Nothing, "TABLE"})
'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
For Each schRow As DataRow In m_dtschemaTable.Rows
strSheetname = schRow("TABLE_NAME").ToString
If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "]", OleDBCon)
Dim dtTable As New DataTable("Table1")
cmd.CommandType = CommandType.Text
Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
daGetDataFromSheet.Fill(dtTable)
End Using
'Get the last Column Index
intLastColumn = dtTable.Columns.Count
End Using
'when the first correct sheet is found there is no need to check others
Exit For
End If
Next
OleDBCon.Close()
End Using
Catch ex As Exception
Throw New Exception(ex.Message, ex)
End Try
Dim strColumnname as String = GetExcelColumnName(intLastColumn)
Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & strColumnname & ":" & strColumnname & "]"
Dts.TaskResult = ScriptResults.Success
End Sub
Private Function GetExcelColumnName(columnNumber As Integer) As String
Dim dividend As Integer = columnNumber
Dim columnName As String = String.Empty
Dim modulo As Integer
While dividend > 0
modulo = (dividend - 1) Mod 26
columnName = Convert.ToChar(65 + modulo).ToString() & columnName
dividend = CInt((dividend - modulo) / 26)
End While
Return columnName
End Function
- 然后您必须添加一个 Excel 连接管理器,并选择要导入的 excel 文件 (只是 select 一个示例来定义仅限首次元数据)
- 将默认值
Select * from [Sheet1$]
赋给变量@[User::strQuery]
- 在数据流任务中添加一个Excel源,从变量中选择SQL命令,然后select
@[User::strQuery]
- 将数据流任务
Delay Validation
属性 设置为True
- 将其他组件添加到数据流任务