使用 OleDB 范围错误从 Excel 2013 文件中读取非常大的数据

Reading very large data from an Excel 2013 file using OleDB range error

我正在尝试在 OleDB 的帮助下使用 Visual Basic.NET 读取一个 Excel 2013 文件(.xlsx,大小约为 100 MB)。主要问题是在以下行中出现系统内存不足异常:

da.Fill(dt)

来自下面的代码。

Private Function ReadExcelFile() As DataSet
    Dim ds As New DataSet()

    Dim connectionString As String =
    "Provider=Microsoft.ACE.OLEDB.12.0;;Extended Properties=Excel 12.0 XML;Data Source=C:\file.xlsx;"

    Using connection As New OleDbConnection(connectionString)
        connection.Open()
        Dim cmd As New OleDbCommand()
        cmd.Connection = connection
        Dim dtSheet As DataTable = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)

        For Each dr As DataRow In dtSheet.Rows
            Dim sheetName As String = dr("TABLE_NAME").ToString()
            If Not sheetName.EndsWith("$") Then
                Continue For
            End If

            cmd.CommandText = "SELECT * FROM [" & sheetName & "];"
            Dim dt As New DataTable()
            dt.TableName = sheetName
            Dim da As New OleDbDataAdapter(cmd)
            da.Fill(dt)
            ds.Tables.Add(dt)
        Next

        cmd = Nothing
        connection.Close()
    End Using
    Return ds
End Function

但我认为最好的解决方案是按块读取数据,所以我发现我可以通过在 SQL 语句中添加列范围来读取数据,如下所示:

 cmd.CommandText = "SELECT * FROM [" & sheetName & "B1:B10];"

我通过在该范围内递增来进行循环,但我发现了一个错误。以此为例,

cmd.CommandText = "SELECT * FROM [" & sheetName & "B50000:B51000];"

它仍然有效。但是,如果我这样做,

cmd.CommandText = "SELECT * FROM [" & sheetName & "B70000:B70001];"

我收到这个错误。

请注意,Excel 文件有 475128 行,B70000-B70001 还不到总数的一半。

有人能解释一下吗?我想我在这里遗漏了一些东西。

我找到了可行的解决方案。不使用 DataSet,而是使用 DataReader。我加一个worker就不会挂了

 Private Function ReadExcelFile() As DataSet
    Dim ds As New DataSet()

    Dim connectionString As String = GetConnectionString()

    Using connection As New OleDbConnection(connectionString)
        connection.Open()
        Dim cmd As New OleDbCommand()
        cmd.Connection = connection
        Dim dtSheet As DataTable = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)

        For Each dr As DataRow In dtSheet.Rows
            Dim sheetName As String = dr("TABLE_NAME").ToString()
            If Not sheetName.EndsWith("$") Then
                Continue For
            End If
            cmd.CommandText = "SELECT * FROM [" & sheetName & "];"
            Dim ddr As OleDbDataReader = cmd.ExecuteReader()
            Dim counter As Integer = 0
            While (ddr.Read())
                MessageBox.Show(ddr.GetValue(0))
            End While
        Next
        cmd = Nothing
        connection.Close()
    End Using
    Return ds
End Function

行:

Dim ddr As OleDbDataReader = cmd.ExecuteReader()
Dim counter As Integer = 0
While (ddr.Read())
     MessageBox.Show(ddr.GetValue(0))
End While

是基本代码,您可以在其中访问第一列(索引 0)的行。这是有效的,因为我读到 DataSet 是一个 in-memory 对象(这就是为什么我们可能会出现系统内存不足异常的原因)- Check here for reference

我仍然想知道为什么会出现上述问题。