从日期范围内从 EPPlus 输出中剥离数据
Stripping data from a EPPlus output, from a date range
快速概述:主要目标是从设定日期、行中读取数据,并从设定日期中获取参考号,例如开始日期。
例如,如果我只想要从日期设置到上个月 1 日及以后的数据。
我目前必须从下面的 excel 电子表格示例中提取一些数据:
Start date Ref number
29/07/2015 2342326
01/07/2016 5697455
02/08/2016 3453787
02/08/2016 5345355
02/08/2015 8364456
03/08/2016 1479789
04/07/2015 9334578
使用 EPPlus 的输出:
29/07/2015
2342326
29/07/2016
5697455
02/08/2016
3453787
02/08/2016
5345355
02/08/2015
8364456
03/08/2016
1479789
04/07/2015
9334578
这部分很好,但是当我尝试通过日期范围去除输出时出现错误,例如使用 LINQ 我得到以下错误输出。
An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.DataSetExtensions.dll
Additional information: Specified cast is not valid.
LINQ 代码:
var rowsOfInterest = tbl.AsEnumerable()
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
.ToList();
我还尝试使用数据表修改起始日期范围:
DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");
但是得到如下错误:
An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll
Additional information: Cannot perform '>=' operation on System.String and System.Double.
最后一次尝试是尝试看看我是否可以从循环中删除日期。
使用的代码:
DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();
if (DateTime.TryParse(dt, out dDate))
{
DateTime dts = Convert.ToDateTime(dt);
}
DateTime date1 = new DateTime(2016, 7, 1);
if (dDate >= date1)
{
Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
这种方法有效,但只列出了设置的日期而不是其中的值,这是可以理解的,如果我选择这条路线,我将如何获得具有这些值的日期?
输出:
29/07/2016
02/08/2016
02/08/2016
03/08/2016
使用的完整代码示例:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.OleDb;
using System.Text.RegularExpressions;
using Microsoft.Office.Interop.Excel;
using System.Data;
using System.IO;
namespace Number_Cleaner
{
public class NumbersReport
{
//ToDo: Look in to fixing the code so it filters the date correctly with the right output data.
public System.Data.DataTable GetDataTableFromExcel(string path, bool hasHeader = true)
{
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(path))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
System.Data.DataTable tbl = new System.Data.DataTable();
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
DataRow row = tbl.Rows.Add();
foreach (var cell in wsRow)
{
DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();
//Console.WriteLine(dt);
if (DateTime.TryParse(dt, out dDate))
{
DateTime dts = Convert.ToDateTime(dt);
}
DateTime date1 = new DateTime(2016, 7, 1);
if (dDate >= date1)
{
Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
//Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
}
//var rowsOfInterest = tbl.AsEnumerable()
// .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
//.ToList();
//Console.WriteLine(tbl);
//DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");
return tbl;
}
}
修改自:
根据您的代码,您通过调用 cell.Text
将 DataTable
中的所有内容存储为 strings
。但是使用它你会失去有价值的信息 - 单元格数据类型。您最好使用 cell.Value
,它可以是 string
或 double
。对于 Excel,日期、整数和小数值都存储为 doubles
.
您看到的错误与以下事实有关:您将值存储为字符串,但在此处查询它们 DateTime
:
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
这里:
"'Start date' >= #1/7/2016#"
如果您在此处查看我的 post:,您将看到辅助函数 ConvertSheetToObjects
,它可以处理您正在尝试执行的大部分操作。稍微修改一下,我们就可以将它变成接受 WorkSheet
并将其转换为 DataTable
的东西。与对象转换方法一样,您仍然应该以 DataTable
传递给它的形式为它提供预期的结构,而不是让它尝试通过转换单元格值来猜测它:
public static void ConvertSheetToDataTable(this ExcelWorksheet worksheet, ref DataTable dataTable)
{
//DateTime Conversion
var convertDateTime = new Func<double, DateTime>(excelDate =>
{
if (excelDate < 1)
throw new ArgumentException("Excel dates cannot be smaller than 0.");
var dateOfReference = new DateTime(1900, 1, 1);
if (excelDate > 60d)
excelDate = excelDate - 2;
else
excelDate = excelDate - 1;
return dateOfReference.AddDays(excelDate);
});
//Get the names in the destination TABLE
var tblcolnames = dataTable
.Columns
.Cast<DataColumn>()
.Select(dcol => new {Name = dcol.ColumnName, Type = dcol.DataType})
.ToList();
//Cells only contains references to cells with actual data
var cellGroups = worksheet.Cells
.GroupBy(cell => cell.Start.Row)
.ToList();
//Assume first row has the column names and get the names of the columns in the sheet that have a match in the table
var colnames = cellGroups
.First()
.Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx })
.Where(o => tblcolnames.Select(tcol => tcol.Name).Contains(o.Name))
.ToList();
//Add the rows - skip the first cell row
for (var i = 1; i < cellGroups.Count(); i++)
{
var cellrow = cellGroups[i].ToList();
var tblrow = dataTable.NewRow();
dataTable.Rows.Add(tblrow);
colnames.ForEach(colname =>
{
//Excel stores either strings or doubles
var cell = cellrow[colname.index];
var val = cell.Value;
var celltype = val.GetType();
var coltype = tblcolnames.First(tcol => tcol.Name == colname.Name).Type;
//If it is numeric it is a double since that is how excel stores all numbers
if (celltype == typeof(double))
{
//Unbox it
var unboxedVal = (double)val;
//FAR FROM A COMPLETE LIST!!!
if (coltype == typeof (int))
tblrow[colname.Name] = (int) unboxedVal;
else if (coltype == typeof (double))
tblrow[colname.Name] = unboxedVal;
else
throw new NotImplementedException($"Type '{coltype}' not implemented yet!");
}
else if (coltype == typeof (DateTime))
{
//Its a date time
tblrow[colname.Name] = val;
}
else if (coltype == typeof (string))
{
//Its a string
tblrow[colname.Name] = val;
}
else
{
throw new DataException($"Cell '{cell.Address}' contains data of type {celltype} but should be of type {coltype}!");
}
});
}
}
要在这样的东西上使用它:
你会运行这个:
[TestMethod]
public void Sheet_To_Table_Test()
{
//
//Create a test file
var fi = new FileInfo(@"c:\temp\Sheet_To_Table.xlsx");
using (var package = new ExcelPackage(fi))
{
var workbook = package.Workbook;
var worksheet = workbook.Worksheets.First();
var datatable = new DataTable();
datatable.Columns.Add("Col1", typeof(int));
datatable.Columns.Add("Col2", typeof(string));
datatable.Columns.Add("Col3", typeof(double));
datatable.Columns.Add("Col4", typeof(DateTime));
worksheet.ConvertSheetToDataTable(ref datatable);
foreach (DataRow row in datatable.Rows)
Console.WriteLine(
$"row: {{Col1({row["Col1"].GetType()}): {row["Col1"]}" +
$", Col2({row["Col2"].GetType()}): {row["Col2"]}" +
$", Col3({row["Col3"].GetType()}): {row["Col3"]}" +
$", Col4({row["Col4"].GetType()}):{row["Col4"]}}}");
//To Answer OP's questions
datatable
.Select("Col4 >= #01/03/2016#")
.Select(row => row["Col1"])
.ToList()
.ForEach(num => Console.WriteLine($"{{{num}}}"));
}
}
这在输出中给出了这个:
row: {Col1(System.Int32): 12345, Col2(System.String): sf, Col3(System.Double): 456.549, Col4(System.DateTime):1/1/2016 12:00:00 AM}
row: {Col1(System.Int32): 456, Col2(System.String): asg, Col3(System.Double): 165.55, Col4(System.DateTime):1/2/2016 12:00:00 AM}
row: {Col1(System.Int32): 8, Col2(System.String): we, Col3(System.Double): 148.5, Col4(System.DateTime):1/3/2016 12:00:00 AM}
row: {Col1(System.Int32): 978, Col2(System.String): wer, Col3(System.Double): 668.456, Col4(System.DateTime):1/4/2016 12:00:00 AM}
{8}
{978}
快速概述:主要目标是从设定日期、行中读取数据,并从设定日期中获取参考号,例如开始日期。
例如,如果我只想要从日期设置到上个月 1 日及以后的数据。
我目前必须从下面的 excel 电子表格示例中提取一些数据:
Start date Ref number
29/07/2015 2342326
01/07/2016 5697455
02/08/2016 3453787
02/08/2016 5345355
02/08/2015 8364456
03/08/2016 1479789
04/07/2015 9334578
使用 EPPlus 的输出:
29/07/2015
2342326
29/07/2016
5697455
02/08/2016
3453787
02/08/2016
5345355
02/08/2015
8364456
03/08/2016
1479789
04/07/2015
9334578
这部分很好,但是当我尝试通过日期范围去除输出时出现错误,例如使用 LINQ 我得到以下错误输出。
An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.DataSetExtensions.dll
Additional information: Specified cast is not valid.
LINQ 代码:
var rowsOfInterest = tbl.AsEnumerable()
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
.ToList();
我还尝试使用数据表修改起始日期范围:
DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");
但是得到如下错误:
An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll
Additional information: Cannot perform '>=' operation on System.String and System.Double.
最后一次尝试是尝试看看我是否可以从循环中删除日期。
使用的代码:
DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();
if (DateTime.TryParse(dt, out dDate))
{
DateTime dts = Convert.ToDateTime(dt);
}
DateTime date1 = new DateTime(2016, 7, 1);
if (dDate >= date1)
{
Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
这种方法有效,但只列出了设置的日期而不是其中的值,这是可以理解的,如果我选择这条路线,我将如何获得具有这些值的日期?
输出:
29/07/2016
02/08/2016
02/08/2016
03/08/2016
使用的完整代码示例:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.OleDb;
using System.Text.RegularExpressions;
using Microsoft.Office.Interop.Excel;
using System.Data;
using System.IO;
namespace Number_Cleaner
{
public class NumbersReport
{
//ToDo: Look in to fixing the code so it filters the date correctly with the right output data.
public System.Data.DataTable GetDataTableFromExcel(string path, bool hasHeader = true)
{
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(path))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
System.Data.DataTable tbl = new System.Data.DataTable();
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
DataRow row = tbl.Rows.Add();
foreach (var cell in wsRow)
{
DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();
//Console.WriteLine(dt);
if (DateTime.TryParse(dt, out dDate))
{
DateTime dts = Convert.ToDateTime(dt);
}
DateTime date1 = new DateTime(2016, 7, 1);
if (dDate >= date1)
{
Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
//Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
}
//var rowsOfInterest = tbl.AsEnumerable()
// .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
//.ToList();
//Console.WriteLine(tbl);
//DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");
return tbl;
}
}
修改自:
根据您的代码,您通过调用 cell.Text
将 DataTable
中的所有内容存储为 strings
。但是使用它你会失去有价值的信息 - 单元格数据类型。您最好使用 cell.Value
,它可以是 string
或 double
。对于 Excel,日期、整数和小数值都存储为 doubles
.
您看到的错误与以下事实有关:您将值存储为字符串,但在此处查询它们 DateTime
:
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
这里:
"'Start date' >= #1/7/2016#"
如果您在此处查看我的 post:ConvertSheetToObjects
,它可以处理您正在尝试执行的大部分操作。稍微修改一下,我们就可以将它变成接受 WorkSheet
并将其转换为 DataTable
的东西。与对象转换方法一样,您仍然应该以 DataTable
传递给它的形式为它提供预期的结构,而不是让它尝试通过转换单元格值来猜测它:
public static void ConvertSheetToDataTable(this ExcelWorksheet worksheet, ref DataTable dataTable)
{
//DateTime Conversion
var convertDateTime = new Func<double, DateTime>(excelDate =>
{
if (excelDate < 1)
throw new ArgumentException("Excel dates cannot be smaller than 0.");
var dateOfReference = new DateTime(1900, 1, 1);
if (excelDate > 60d)
excelDate = excelDate - 2;
else
excelDate = excelDate - 1;
return dateOfReference.AddDays(excelDate);
});
//Get the names in the destination TABLE
var tblcolnames = dataTable
.Columns
.Cast<DataColumn>()
.Select(dcol => new {Name = dcol.ColumnName, Type = dcol.DataType})
.ToList();
//Cells only contains references to cells with actual data
var cellGroups = worksheet.Cells
.GroupBy(cell => cell.Start.Row)
.ToList();
//Assume first row has the column names and get the names of the columns in the sheet that have a match in the table
var colnames = cellGroups
.First()
.Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx })
.Where(o => tblcolnames.Select(tcol => tcol.Name).Contains(o.Name))
.ToList();
//Add the rows - skip the first cell row
for (var i = 1; i < cellGroups.Count(); i++)
{
var cellrow = cellGroups[i].ToList();
var tblrow = dataTable.NewRow();
dataTable.Rows.Add(tblrow);
colnames.ForEach(colname =>
{
//Excel stores either strings or doubles
var cell = cellrow[colname.index];
var val = cell.Value;
var celltype = val.GetType();
var coltype = tblcolnames.First(tcol => tcol.Name == colname.Name).Type;
//If it is numeric it is a double since that is how excel stores all numbers
if (celltype == typeof(double))
{
//Unbox it
var unboxedVal = (double)val;
//FAR FROM A COMPLETE LIST!!!
if (coltype == typeof (int))
tblrow[colname.Name] = (int) unboxedVal;
else if (coltype == typeof (double))
tblrow[colname.Name] = unboxedVal;
else
throw new NotImplementedException($"Type '{coltype}' not implemented yet!");
}
else if (coltype == typeof (DateTime))
{
//Its a date time
tblrow[colname.Name] = val;
}
else if (coltype == typeof (string))
{
//Its a string
tblrow[colname.Name] = val;
}
else
{
throw new DataException($"Cell '{cell.Address}' contains data of type {celltype} but should be of type {coltype}!");
}
});
}
}
要在这样的东西上使用它:
你会运行这个:
[TestMethod]
public void Sheet_To_Table_Test()
{
//
//Create a test file
var fi = new FileInfo(@"c:\temp\Sheet_To_Table.xlsx");
using (var package = new ExcelPackage(fi))
{
var workbook = package.Workbook;
var worksheet = workbook.Worksheets.First();
var datatable = new DataTable();
datatable.Columns.Add("Col1", typeof(int));
datatable.Columns.Add("Col2", typeof(string));
datatable.Columns.Add("Col3", typeof(double));
datatable.Columns.Add("Col4", typeof(DateTime));
worksheet.ConvertSheetToDataTable(ref datatable);
foreach (DataRow row in datatable.Rows)
Console.WriteLine(
$"row: {{Col1({row["Col1"].GetType()}): {row["Col1"]}" +
$", Col2({row["Col2"].GetType()}): {row["Col2"]}" +
$", Col3({row["Col3"].GetType()}): {row["Col3"]}" +
$", Col4({row["Col4"].GetType()}):{row["Col4"]}}}");
//To Answer OP's questions
datatable
.Select("Col4 >= #01/03/2016#")
.Select(row => row["Col1"])
.ToList()
.ForEach(num => Console.WriteLine($"{{{num}}}"));
}
}
这在输出中给出了这个:
row: {Col1(System.Int32): 12345, Col2(System.String): sf, Col3(System.Double): 456.549, Col4(System.DateTime):1/1/2016 12:00:00 AM}
row: {Col1(System.Int32): 456, Col2(System.String): asg, Col3(System.Double): 165.55, Col4(System.DateTime):1/2/2016 12:00:00 AM}
row: {Col1(System.Int32): 8, Col2(System.String): we, Col3(System.Double): 148.5, Col4(System.DateTime):1/3/2016 12:00:00 AM}
row: {Col1(System.Int32): 978, Col2(System.String): wer, Col3(System.Double): 668.456, Col4(System.DateTime):1/4/2016 12:00:00 AM}
{8}
{978}