导入大型 .xlsx 文件非常慢

Importing large .xlsx file very slow

我是 c# 和 WPF 的新手,正在尝试将大型 .xlsx 文件导入数据网格,我可以拥有大约 200 多列和 100,000 多行。使用我目前的方法需要一个多小时(我没有让它完成)。我的 csv 格式格式示例是;

"Time","Dist","V_Front","V_Rear","RPM"
"s","m","km/h","km/h","rpm"
"0.000","0","30.3","30.0","11995"
"0.005","0","30.3","30.0","11965"
"0.010","0","30.3","31.0","11962"

我目前正在使用 Interop,但我想知道是否有另一种方法可以大大减少加载时间。我希望使用 SciCharts(他们有学生许可证)绘制这些数据,并带有用于选择频道的复选框,但这是另一回事。

.CS

    private void Button_Click(object sender, RoutedEventArgs e)
    {
        OpenFileDialog openfile = new OpenFileDialog();
        openfile.DefaultExt = ".xlsx";
        openfile.Filter = "(.xlsx)|*.xlsx";

        var browsefile = openfile.ShowDialog();

        if (browsefile == true)
        {
            txtFilePath.Text = openfile.FileName;

            Microsoft.Office.Interop.Excel.Application excelApp = new Microsoft.Office.Interop.Excel.Application();
            Microsoft.Office.Interop.Excel.Workbook excelBook = excelApp.Workbooks.Open(txtFilePath.Text.ToString(), 0, true, 5, "", "", true, Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
            Microsoft.Office.Interop.Excel.Worksheet excelSheet = (Microsoft.Office.Interop.Excel.Worksheet)excelBook.Worksheets.get_Item(1); ;
            Microsoft.Office.Interop.Excel.Range excelRange = excelSheet.UsedRange;

            string strCellData = "";
            double douCellData;
            int rowCnt = 0;
            int colCnt = 0;

            DataTable dt = new DataTable();
            for (colCnt = 1; colCnt <= excelRange.Columns.Count; colCnt++)
            {
                string strColumn = "";
                strColumn = (string)(excelRange.Cells[1, colCnt] as Microsoft.Office.Interop.Excel.Range).Value2;
                dt.Columns.Add(strColumn, typeof(string));
            }

            for (rowCnt = 2; rowCnt <= excelRange.Rows.Count; rowCnt++)
            {
                string strData = "";
                for (colCnt = 1; colCnt <= excelRange.Columns.Count; colCnt++)
                {
                    try
                    {
                        strCellData = (string)(excelRange.Cells[rowCnt, colCnt] as Microsoft.Office.Interop.Excel.Range).Value2;
                        strData += strCellData + "|";
                    }
                    catch (Exception ex)
                    {
                        douCellData = (excelRange.Cells[rowCnt, colCnt] as Microsoft.Office.Interop.Excel.Range).Value2;
                        strData += douCellData.ToString() + "|";
                    }
                }
                strData = strData.Remove(strData.Length - 1, 1);
                dt.Rows.Add(strData.Split('|'));
            }

            dtGrid.ItemsSource = dt.DefaultView;

            excelBook.Close(true, null, null);
            excelApp.Quit();


        }
    }

非常感谢任何帮助。

问题是单个读取太多,导致 Excel 和您的应用程序之间有大量反射使用和编组。如果您不关心内存使用情况,您可以将整个 Range 读入内存并从内存中工作,而不是单独读取单元格。下面的代码 运行s 在 3880 ms 的测试文件上有 5 列和 103938 行:

OpenFileDialog openfile = new OpenFileDialog();
openfile.DefaultExt = ".xlsx";
openfile.Filter = "(.xlsx)|*.xlsx";

var browsefile = openfile.ShowDialog();

if (browsefile == true)
{
    txtFilePath.Text = openfile.FileName;

    var excelApp = new Microsoft.Office.Interop.Excel.Application();
    var excelBook = excelApp.Workbooks.Open(txtFilePath.Text, 0, true, 5, "", "", true,
        Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
    var excelSheet = (Microsoft.Office.Interop.Excel.Worksheet) excelBook.Worksheets.Item[1];

    Microsoft.Office.Interop.Excel.Range excelRange = excelSheet.UsedRange;

    DataTable dt = new DataTable();

    object[,] value = excelRange.Value;

    int columnsCount = value.GetLength(1);
    for (var colCnt = 1; colCnt <= columnsCount; colCnt++)
    {
        dt.Columns.Add((string)value[1, colCnt], typeof(string));
    }

    int rowsCount = value.GetLength(0);
    for (var rowCnt = 2; rowCnt <= rowsCount; rowCnt++)
    {
        var dataRow = dt.NewRow();
        for (var colCnt = 1; colCnt <= columnsCount; colCnt++)
        {
            dataRow[colCnt - 1] = value[rowCnt, colCnt];
        }
        dt.Rows.Add(dataRow);
    }

    dtGrid.ItemsSource = dt.DefaultView;

    excelBook.Close(true);
    excelApp.Quit();
}

如果您不想阅读整个 Range,那么您应该分批阅读。

另一个优化是 运行 在后台线程上执行此操作,因此它不会在加载时阻塞 UI。

编辑

对于 运行 在后台线程上执行此操作,您可以将按钮单击处理程序修改为异步方法,并将解析逻辑放入另一个方法中,该方法 运行 是线程池上的实际解析线程 Task.Run:

private async void Button_Click(object sender, RoutedEventArgs e)
{
    OpenFileDialog openfile = new OpenFileDialog();
    openfile.DefaultExt = ".xlsx";
    openfile.Filter = "(.xlsx)|*.xlsx";

    var browsefile = openfile.ShowDialog();

    if (browsefile == true)
    {
        txtFilePath.Text = openfile.FileName;

        DataTable dataTable = await ParseExcel(txtFilePath.Text).ConfigureAwait(true);

        dtGrid.ItemsSource = dataTable.DefaultView;
    }
}

private Task<DataTable> ParseExcel(string filePath)
{
    return Task.Run(() =>
    {
        var excelApp = new Microsoft.Office.Interop.Excel.Application();
        var excelBook = excelApp.Workbooks.Open(filePath, 0, true, 5, "", "", true,
            Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
        var excelSheet = (Microsoft.Office.Interop.Excel.Worksheet) excelBook.Worksheets.Item[1];

        Microsoft.Office.Interop.Excel.Range excelRange = excelSheet.UsedRange;

        DataTable dt = new DataTable();

        object[,] value = excelRange.Value;

        int columnsCount = value.GetLength(1);
        for (var colCnt = 1; colCnt <= columnsCount; colCnt++)
        {
            dt.Columns.Add((string) value[1, colCnt], typeof(string));
        }

        int rowsCount = value.GetLength(0);
        for (var rowCnt = 2; rowCnt <= rowsCount; rowCnt++)
        {
            var dataRow = dt.NewRow();
            for (var colCnt = 1; colCnt <= columnsCount; colCnt++)
            {
                dataRow[colCnt - 1] = value[rowCnt, colCnt];
            }
            dt.Rows.Add(dataRow);
        }

        excelBook.Close(true);
        excelApp.Quit();

        return dt;
    });
}

处理程序只是在后台线程上调用解析函数,解析函数 运行s,当它完成时,处理程序可以通过将结果 DataTable 分配给 ItemsSource 来继续.