通过 OpenXml SDK 生成的 XLSX 文件有效和无效
XLSX file via OpenXml SDK Both Valid and Invalid
我有一个程序可以将 System.Data.DataTable 导出到 XLSX / OpenXml 电子表格。最后让它主要工作。然而,当在 Excel 中打开电子表格时,Excel 抱怨文件无效,需要修复,给出此消息...
We found a problem with some content in . Do you want us to
try to recover as much as we can? If you trust the source of the
workbook, clik Yes.
如果我单击“是”,它会返回此消息...
单击日志文件并打开它,只显示这个...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error268360_01.xml</logFileName>
<summary>Errors were detected in file 'C:\Users\aabdi\AppData\Local\Temp\data.20190814.152538.xlsx'</summary>
<repairedRecords>
<repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
</repairedRecords>
</recoveryLog>
显然,我们不想像这样将它部署到生产环境中。所以我一直在想办法解决这个问题。我拼凑了一个快速的小样本来验证 XML 并根据 this link from MSDN 显示错误。但是当我 运行 程序并加载 Excel 抱怨的完全相同的 XLSX 文档时,验证器返回说该文件完全有效。所以我不确定从那里还能去哪里。
是否有更好的工具来验证我的 XLSX XML?以下是我用来生成 XLSX 文件的完整代码。 (是的,它在 VB.NET 中,它是一个旧版应用程序。)
如果我注释掉 For Each dr As DataRow
循环中的行,则 XLSX 文件在 Excel 中可以正常打开(只是没有任何数据)。所以这与单个细胞有关,但我并没有真正对它们做太多。设置值和数据类型,仅此而已。
我还尝试用以下内容替换 ConstructDataRow
中的 For Each
循环,但它仍然输出相同的 "bad" XML...
rv.Append(
(From dc In dr.Table.Columns
Select ConstructCell(
NVL(dr(dc.Ordinal), String.Empty),
MapSystemTypeToCellType(dc.DataType)
)
).ToArray()
)
还尝试用每个单元格的 AppendChild
替换对 Append
的调用,但这也没有帮助。
压缩后的 XLSX 文件(有误,带有虚拟数据)可在此处获得:
https://drive.google.com/open?id=1KVVWEqH7VHMxwbRA-Pn807SXHZ32oJWR
完整数据表到 Excel XLSX 代码
#Region " ToExcel "
<Extension>
Public Function ToExcel(ByVal target As DataTable) As Attachment
Dim filename = Path.GetTempFileName()
Using doc As SpreadsheetDocument = SpreadsheetDocument.Create(filename, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook)
Dim data = New SheetData()
Dim wbp = doc.AddWorkbookPart()
wbp.Workbook = New Workbook()
Dim wsp = wbp.AddNewPart(Of WorksheetPart)()
wsp.Worksheet = New Worksheet(data)
Dim sheets = wbp.Workbook.AppendChild(New Sheets())
Dim sheet = New Sheet() With {.Id = wbp.GetIdOfPart(wsp), .SheetId = 1, .Name = "Data"}
sheets.Append(sheet)
data.AppendChild(ConstructHeaderRow(target))
For Each dr As DataRow In target.Rows
data.AppendChild(ConstructDataRow(dr)) '// THIS LINE YIELDS THE BAD PARTS
Next
wbp.Workbook.Save()
End Using
Dim attachmentname As String = Path.Combine(Path.GetDirectoryName(filename), $"data.{Now.ToString("yyyyMMdd.HHmmss")}.xlsx")
File.Move(filename, attachmentname)
Return New Attachment(attachmentname, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
End Function
Private Function ConstructHeaderRow(dt As DataTable) As Row
Dim rv = New Row()
For Each dc As DataColumn In dt.Columns
rv.Append(ConstructCell(dc.ColumnName, CellValues.String))
Next
Return rv
End Function
Private Function ConstructDataRow(dr As DataRow) As Row
Dim rv = New Row()
For Each dc As DataColumn In dr.Table.Columns
rv.Append(ConstructCell(NVL(dr(dc.Ordinal), String.Empty), MapSystemTypeToCellType(dc.DataType)))
Next
Return rv
End Function
Private Function ConstructCell(value As String, datatype As CellValues) As Cell
Return New Cell() With {
.CellValue = New CellValue(value),
.DataType = datatype
}
End Function
Private Function MapSystemTypeToCellType(t As System.Type) As CellValues
Dim rv As CellValues
Select Case True
Case t Is GetType(String)
rv = CellValues.String
Case t Is GetType(Date)
rv = CellValues.Date
Case t Is GetType(Boolean)
rv = CellValues.Boolean
Case IsNumericType(t)
rv = CellValues.Number
Case Else
rv = CellValues.String
End Select
Return rv
End Function
#End Region
对于任何进来发现这个的人,我最终追踪到了 Cell.DataType
设置值 CellValues.Date
将导致 Excel 想要 "fix" 文档。
(显然对于日期,the DataType should be NULL,而 Date
仅用于 Office 2010)。
此外,如果您指定 CellValues.Boolean
的 DataType,则 CellValue 需要为 0 或 1。"true" / "false" 也会导致 Excel想要 "fix" 您的电子表格。
此外,Microsoft 已经构建了一个更好的验证器工具供下载:
https://www.microsoft.com/en-us/download/details.aspx?id=30425
我有一个程序可以将 System.Data.DataTable 导出到 XLSX / OpenXml 电子表格。最后让它主要工作。然而,当在 Excel 中打开电子表格时,Excel 抱怨文件无效,需要修复,给出此消息...
We found a problem with some content in . Do you want us to try to recover as much as we can? If you trust the source of the workbook, clik Yes.
如果我单击“是”,它会返回此消息...
单击日志文件并打开它,只显示这个...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error268360_01.xml</logFileName>
<summary>Errors were detected in file 'C:\Users\aabdi\AppData\Local\Temp\data.20190814.152538.xlsx'</summary>
<repairedRecords>
<repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
</repairedRecords>
</recoveryLog>
显然,我们不想像这样将它部署到生产环境中。所以我一直在想办法解决这个问题。我拼凑了一个快速的小样本来验证 XML 并根据 this link from MSDN 显示错误。但是当我 运行 程序并加载 Excel 抱怨的完全相同的 XLSX 文档时,验证器返回说该文件完全有效。所以我不确定从那里还能去哪里。
是否有更好的工具来验证我的 XLSX XML?以下是我用来生成 XLSX 文件的完整代码。 (是的,它在 VB.NET 中,它是一个旧版应用程序。)
如果我注释掉 For Each dr As DataRow
循环中的行,则 XLSX 文件在 Excel 中可以正常打开(只是没有任何数据)。所以这与单个细胞有关,但我并没有真正对它们做太多。设置值和数据类型,仅此而已。
我还尝试用以下内容替换 ConstructDataRow
中的 For Each
循环,但它仍然输出相同的 "bad" XML...
rv.Append(
(From dc In dr.Table.Columns
Select ConstructCell(
NVL(dr(dc.Ordinal), String.Empty),
MapSystemTypeToCellType(dc.DataType)
)
).ToArray()
)
还尝试用每个单元格的 AppendChild
替换对 Append
的调用,但这也没有帮助。
压缩后的 XLSX 文件(有误,带有虚拟数据)可在此处获得:
https://drive.google.com/open?id=1KVVWEqH7VHMxwbRA-Pn807SXHZ32oJWR
完整数据表到 Excel XLSX 代码
#Region " ToExcel "
<Extension>
Public Function ToExcel(ByVal target As DataTable) As Attachment
Dim filename = Path.GetTempFileName()
Using doc As SpreadsheetDocument = SpreadsheetDocument.Create(filename, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook)
Dim data = New SheetData()
Dim wbp = doc.AddWorkbookPart()
wbp.Workbook = New Workbook()
Dim wsp = wbp.AddNewPart(Of WorksheetPart)()
wsp.Worksheet = New Worksheet(data)
Dim sheets = wbp.Workbook.AppendChild(New Sheets())
Dim sheet = New Sheet() With {.Id = wbp.GetIdOfPart(wsp), .SheetId = 1, .Name = "Data"}
sheets.Append(sheet)
data.AppendChild(ConstructHeaderRow(target))
For Each dr As DataRow In target.Rows
data.AppendChild(ConstructDataRow(dr)) '// THIS LINE YIELDS THE BAD PARTS
Next
wbp.Workbook.Save()
End Using
Dim attachmentname As String = Path.Combine(Path.GetDirectoryName(filename), $"data.{Now.ToString("yyyyMMdd.HHmmss")}.xlsx")
File.Move(filename, attachmentname)
Return New Attachment(attachmentname, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
End Function
Private Function ConstructHeaderRow(dt As DataTable) As Row
Dim rv = New Row()
For Each dc As DataColumn In dt.Columns
rv.Append(ConstructCell(dc.ColumnName, CellValues.String))
Next
Return rv
End Function
Private Function ConstructDataRow(dr As DataRow) As Row
Dim rv = New Row()
For Each dc As DataColumn In dr.Table.Columns
rv.Append(ConstructCell(NVL(dr(dc.Ordinal), String.Empty), MapSystemTypeToCellType(dc.DataType)))
Next
Return rv
End Function
Private Function ConstructCell(value As String, datatype As CellValues) As Cell
Return New Cell() With {
.CellValue = New CellValue(value),
.DataType = datatype
}
End Function
Private Function MapSystemTypeToCellType(t As System.Type) As CellValues
Dim rv As CellValues
Select Case True
Case t Is GetType(String)
rv = CellValues.String
Case t Is GetType(Date)
rv = CellValues.Date
Case t Is GetType(Boolean)
rv = CellValues.Boolean
Case IsNumericType(t)
rv = CellValues.Number
Case Else
rv = CellValues.String
End Select
Return rv
End Function
#End Region
对于任何进来发现这个的人,我最终追踪到了 Cell.DataType
设置值 CellValues.Date
将导致 Excel 想要 "fix" 文档。
(显然对于日期,the DataType should be NULL,而 Date
仅用于 Office 2010)。
此外,如果您指定 CellValues.Boolean
的 DataType,则 CellValue 需要为 0 或 1。"true" / "false" 也会导致 Excel想要 "fix" 您的电子表格。
此外,Microsoft 已经构建了一个更好的验证器工具供下载:
https://www.microsoft.com/en-us/download/details.aspx?id=30425