通过 SSIS 仅导入 Excel 的最后一列

Import only Last column of Excel through SSIS

我有一个 excel 文件,我每天都会收到。该文件中的列数不具体。我的要求只是通过 SSIS 加载我的 table 中的最后一列。我将如何动态识别上次使用的列?

不,你不能那样做。列数和数据类型必须事先确定,不能更改。否则 SSIS 将失败。所以没有办法动态获取最后一列。解决方法可能是使用某些宏从 excel 内部获取最后一列,然后将其用作 SSIS 的源。

您可以使用 c# 脚本:

确保添加 Using System.Data.OleDb;到命名空间区域 并添加输出列 LastCol 和 select 数据类型。

public override void CreateNewOutputRows()
    {
        /*
          Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
          For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
        */
        string fileName = @"C:\test.xlsx";
        string SheetName = "Sheet1";
        string cstr = "Provider.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=No;IMEX=1\"";

    OleDbConnection xlConn = new OleDbConnection(cstr);
    xlConn.Open();

    OleDbCommand xlCmd = xlConn.CreateCommand();
    xlCmd.CommandText = "Select * from [" + SheetName + "]";
    xlCmd.CommandType = CommandType.Text;
    OleDbDataReader rdr = xlCmd.ExecuteReader();

    int rowCt = 0; //Counter

    while (rdr.Read())
    {
        //skip headers
        if (rowCt != 0)
        {
            int maxCol = rdr.FieldCount;
            Output0Buffer.AddRow();
            Output0Buffer.LastCol = (int)rdr[maxCol];
        }
        rowCt++; //increment counter
    }
}

解决方案概述

使用脚本任务来:

  • 获取最后一列索引
  • 使用以下函数将索引转换为列字母 (例如:1 -> A)

    Private Function GetExcelColumnName(columnNumber As Integer) As String
        Dim dividend As Integer = columnNumber
        Dim columnName As String = String.Empty
        Dim modulo As Integer
    
        While dividend > 0
           modulo = (dividend - 1) Mod 26
           columnName = Convert.ToChar(65 + modulo).ToString() & columnName
           dividend = CInt((dividend - modulo) / 26)
       End While
    
       Return columnName
    End Function
    
  • 构建只读取最后一列的SQL命令

  • 选择此查询作为 Excel 来源

详细解决方案

这个答案假设Sheet名字是Sheet1,使用的编程语言是VB.Net

  1. 首先创建一个字符串类型的SSIS变量(即@[User::strQuery])
  2. 添加另一个包含 Excel 文件路径的变量 (即 @[User::ExcelFilePath])
  3. 添加脚本任务,select @[User::strQuery] 作为读写变量,@[User::ExcelFilePath] 作为只读变量 (在脚本任务中 window)
  4. 将脚本语言设置为 VB.Net 并在脚本编辑器中 window 编写以下脚本:

注意:您必须导入 System.Data.OleDb

    m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString

    Dim strSheetname As String = String.Empty
    Dim intLastColumn As Integer = 0

    m_strExcelConnectionString = Me.BuildConnectionString()
    Try


        Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)

            If OleDBCon.State <> ConnectionState.Open Then
                OleDBCon.Open()
            End If

            'Get all WorkSheets
            m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
                                                               New Object() {Nothing, Nothing, Nothing, "TABLE"})

            'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones

            For Each schRow As DataRow In m_dtschemaTable.Rows
                strSheetname = schRow("TABLE_NAME").ToString

                If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then

                    Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "]", OleDBCon)

                        Dim dtTable As New DataTable("Table1")


                        cmd.CommandType = CommandType.Text

                        Using daGetDataFromSheet As New OleDbDataAdapter(cmd)

                            daGetDataFromSheet.Fill(dtTable)

                        End Using
                    'Get the last Column Index
                    intLastColumn =  dtTable.Columns.Count

                    End Using

                    'when the first correct sheet is found there is no need to check others
                    Exit For

                End If
            Next

            OleDBCon.Close()

        End Using

    Catch ex As Exception
        Throw New Exception(ex.Message, ex)
    End Try

    Dim strColumnname as String = GetExcelColumnName(intLastColumn)
    Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & strColumnname & ":" & strColumnname & "]"

    Dts.TaskResult = ScriptResults.Success
End Sub


Private Function GetExcelColumnName(columnNumber As Integer) As String
    Dim dividend As Integer = columnNumber
    Dim columnName As String = String.Empty
    Dim modulo As Integer

    While dividend > 0
       modulo = (dividend - 1) Mod 26
       columnName = Convert.ToChar(65 + modulo).ToString() & columnName
       dividend = CInt((dividend - modulo) / 26)
   End While

   Return columnName
End Function
  1. 然后您必须添加一个 Excel 连接管理器,并选择要导入的 excel 文件 (只是 select 一个示例来定义仅限首次元数据)
  2. 将默认值Select * from [Sheet1$]赋给变量@[User::strQuery]
  3. 在数据流任务中添加一个Excel源,从变量中选择SQL命令,然后select@[User::strQuery]
  4. 将数据流任务 Delay Validation 属性 设置为 True
  5. 将其他组件添加到数据流任务

参考资料