从日期范围内从 EPPlus 输出中剥离数据

Stripping data from a EPPlus output, from a date range

快速概述:主要目标是从设定日期、行中读取数据,并从设定日期中获取参考号,例如开始日期。

例如,如果我只想要从日期设置到上个月 1 日及以后的数据。

我目前必须从下面的 excel 电子表格示例中提取一些数据:

Start date  Ref number
29/07/2015  2342326
01/07/2016  5697455
02/08/2016  3453787
02/08/2016  5345355
02/08/2015  8364456
03/08/2016  1479789
04/07/2015  9334578

使用 EPPlus 的输出:

29/07/2015
2342326
29/07/2016
5697455
02/08/2016
3453787
02/08/2016
5345355
02/08/2015
8364456
03/08/2016
1479789
04/07/2015
9334578

这部分很好,但是当我尝试通过日期范围去除输出时出现错误,例如使用 LINQ 我得到以下错误输出。

An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.DataSetExtensions.dll

Additional information: Specified cast is not valid.

LINQ 代码:

var rowsOfInterest = tbl.AsEnumerable()
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
.ToList();

我还尝试使用数据表修改起始日期范围:

DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");

但是得到如下错误:

An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll

Additional information: Cannot perform '>=' operation on System.String and System.Double.

最后一次尝试是尝试看看我是否可以从循环中删除日期。

使用的代码:

DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();

if (DateTime.TryParse(dt, out dDate))
{
    DateTime dts = Convert.ToDateTime(dt);
}

DateTime date1 = new DateTime(2016, 7, 1);

if (dDate >= date1)
{
    Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}

这种方法有效,但只列出了设置的日期而不是其中的值,这是可以理解的,如果我选择这条路线,我将如何获得具有这些值的日期?

输出:

29/07/2016
02/08/2016
02/08/2016
03/08/2016

使用的完整代码示例:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.OleDb;
using System.Text.RegularExpressions;
using Microsoft.Office.Interop.Excel;
using System.Data;
using System.IO;

namespace Number_Cleaner
{
    public class NumbersReport
    {

        //ToDo: Look in to fixing the code so it filters the date correctly with the right output data.
        public System.Data.DataTable GetDataTableFromExcel(string path, bool hasHeader = true)
        {
            using (var pck = new OfficeOpenXml.ExcelPackage())
            {
                using (var stream = File.OpenRead(path))
                {
                    pck.Load(stream);
                }
                var ws = pck.Workbook.Worksheets.First();
                System.Data.DataTable tbl = new System.Data.DataTable();
                foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
                {
                    tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
                }
                var startRow = hasHeader ? 2 : 1;
                for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
                {
                    var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
                    DataRow row = tbl.Rows.Add();
                    foreach (var cell in wsRow)
                    {

                        DateTime dDate;
                        row[cell.Start.Column - 1] = cell.Text;
                        string dt = cell.Text.ToString();
                        //Console.WriteLine(dt);

                        if (DateTime.TryParse(dt, out dDate))
                        {
                            DateTime dts = Convert.ToDateTime(dt);
                        }

                        DateTime date1 = new DateTime(2016, 7, 1);

                        if (dDate >= date1)
                        {
                            Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
                        }

                        //Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
                    }
                }
                //var rowsOfInterest = tbl.AsEnumerable()
                 // .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
                 //.ToList();
                //Console.WriteLine(tbl);
                //DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");

                return tbl;
            }
       }

修改自:

根据您的代码,您通过调用 cell.TextDataTable 中的所有内容存储为 strings。但是使用它你会失去有价值的信息 - 单元格数据类型。您最好使用 cell.Value,它可以是 stringdouble。对于 Excel,日期、整数和小数值都存储为 doubles.

您看到的错误与以下事实有关:您将值存储为字符串,但在此处查询它们 DateTime

.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))

这里:

"'Start date' >= #1/7/2016#"

如果您在此处查看我的 post:,您将看到辅助函数 ConvertSheetToObjects,它可以处理您正在尝试执行的大部分操作。稍微修改一下,我们就可以将它变成接受 WorkSheet 并将其转换为 DataTable 的东西。与对象转换方法一样,您仍然应该以 DataTable 传递给它的形式为它提供预期的结构,而不是让它尝试通过转换单元格值来猜测它:

public static void ConvertSheetToDataTable(this ExcelWorksheet worksheet, ref DataTable dataTable)
{
    //DateTime Conversion
    var convertDateTime = new Func<double, DateTime>(excelDate =>
    {
        if (excelDate < 1)
            throw new ArgumentException("Excel dates cannot be smaller than 0.");

        var dateOfReference = new DateTime(1900, 1, 1);

        if (excelDate > 60d)
            excelDate = excelDate - 2;
        else
            excelDate = excelDate - 1;
        return dateOfReference.AddDays(excelDate);
    });

    //Get the names in the destination TABLE
    var tblcolnames = dataTable
        .Columns
        .Cast<DataColumn>()
        .Select(dcol => new {Name = dcol.ColumnName, Type = dcol.DataType})
        .ToList();

    //Cells only contains references to cells with actual data
    var cellGroups = worksheet.Cells
        .GroupBy(cell => cell.Start.Row)
        .ToList();

    //Assume first row has the column names and get the names of the columns in the sheet that have a match in the table
    var colnames = cellGroups
        .First()
        .Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx })
        .Where(o => tblcolnames.Select(tcol => tcol.Name).Contains(o.Name))
        .ToList();


    //Add the rows - skip the first cell row
    for (var i = 1; i < cellGroups.Count(); i++)
    {
        var cellrow = cellGroups[i].ToList();
        var tblrow = dataTable.NewRow();
        dataTable.Rows.Add(tblrow);

        colnames.ForEach(colname =>
        {
            //Excel stores either strings or doubles
            var cell = cellrow[colname.index];
            var val = cell.Value;
            var celltype = val.GetType();
            var coltype = tblcolnames.First(tcol => tcol.Name ==  colname.Name).Type;

            //If it is numeric it is a double since that is how excel stores all numbers
            if (celltype == typeof(double))
            {
                //Unbox it
                var unboxedVal = (double)val;

                //FAR FROM A COMPLETE LIST!!!
                if (coltype == typeof (int))
                    tblrow[colname.Name] = (int) unboxedVal;
                else if (coltype == typeof (double))
                    tblrow[colname.Name] = unboxedVal;
                else
                    throw new NotImplementedException($"Type '{coltype}' not implemented yet!");
            }
            else if (coltype == typeof (DateTime))
            {
                //Its a date time
                tblrow[colname.Name] = val;
            }
            else if (coltype == typeof (string))
            {
                //Its a string
                tblrow[colname.Name] = val;
            }
            else
            {
                throw new DataException($"Cell '{cell.Address}' contains data of type {celltype} but should be of type {coltype}!");
            }
        });

    }

}

要在这样的东西上使用它:

你会运行这个:

[TestMethod]
public void Sheet_To_Table_Test()
{
    //

    //Create a test file
    var fi = new FileInfo(@"c:\temp\Sheet_To_Table.xlsx");

    using (var package = new ExcelPackage(fi))
    {
        var workbook = package.Workbook;
        var worksheet = workbook.Worksheets.First();

        var datatable = new DataTable();
        datatable.Columns.Add("Col1", typeof(int));
        datatable.Columns.Add("Col2", typeof(string));
        datatable.Columns.Add("Col3", typeof(double));
        datatable.Columns.Add("Col4", typeof(DateTime));

        worksheet.ConvertSheetToDataTable(ref datatable);

        foreach (DataRow row in datatable.Rows)
            Console.WriteLine(
                $"row: {{Col1({row["Col1"].GetType()}): {row["Col1"]}" +
                $", Col2({row["Col2"].GetType()}): {row["Col2"]}" +
                $", Col3({row["Col3"].GetType()}): {row["Col3"]}" +
                $", Col4({row["Col4"].GetType()}):{row["Col4"]}}}");

        //To Answer OP's questions
        datatable
            .Select("Col4 >= #01/03/2016#")
            .Select(row => row["Col1"])
            .ToList()
            .ForEach(num => Console.WriteLine($"{{{num}}}"));
    }
}

这在输出中给出了这个:

row: {Col1(System.Int32): 12345, Col2(System.String): sf, Col3(System.Double): 456.549, Col4(System.DateTime):1/1/2016 12:00:00 AM}
row: {Col1(System.Int32): 456, Col2(System.String): asg, Col3(System.Double): 165.55, Col4(System.DateTime):1/2/2016 12:00:00 AM}
row: {Col1(System.Int32): 8, Col2(System.String): we, Col3(System.Double): 148.5, Col4(System.DateTime):1/3/2016 12:00:00 AM}
row: {Col1(System.Int32): 978, Col2(System.String): wer, Col3(System.Double): 668.456, Col4(System.DateTime):1/4/2016 12:00:00 AM}
{8}
{978}