通过 OpenXml SDK 生成的 XLSX 文件有效和无效

XLSX file via OpenXml SDK Both Valid and Invalid

我有一个程序可以将 System.Data.DataTable 导出到 XLSX / OpenXml 电子表格。最后让它主要工作。然而,当在 Excel 中打开电子表格时,Excel 抱怨文件无效,需要修复,给出此消息...

We found a problem with some content in . Do you want us to try to recover as much as we can? If you trust the source of the workbook, clik Yes.

如果我单击“是”,它会返回此消息...

单击日志文件并打开它,只显示这个...

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <logFileName>error268360_01.xml</logFileName>
        <summary>Errors were detected in file 'C:\Users\aabdi\AppData\Local\Temp\data.20190814.152538.xlsx'</summary>
        <repairedRecords>
            <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
        </repairedRecords>
    </recoveryLog> 

显然,我们不想像这样将它部署到生产环境中。所以我一直在想办法解决这个问题。我拼凑了一个快速的小样本来验证 XML 并根据 this link from MSDN 显示错误。但是当我 运行 程序并加载 Excel 抱怨的完全相同的 XLSX 文档时,验证器返回说该文件完全有效。所以我不确定从那里还能去哪里。

是否有更好的工具来验证我的 XLSX XML?以下是我用来生成 XLSX 文件的完整代码。 (是的,它在 VB.NET 中,它是一个旧版应用程序。)

如果我注释掉 For Each dr As DataRow 循环中的行,则 XLSX 文件在 Excel 中可以正常打开(只是没有任何数据)。所以这与单个细胞有关,但我并没有真正对它们做太多。设置值和数据类型,仅此而已。

我还尝试用以下内容替换 ConstructDataRow 中的 For Each 循环,但它仍然输出相同的 "bad" XML...

        rv.Append(
            (From dc In dr.Table.Columns
             Select ConstructCell(
                 NVL(dr(dc.Ordinal), String.Empty),
                 MapSystemTypeToCellType(dc.DataType)
             )
            ).ToArray()
        )

还尝试用每个单元格的 AppendChild 替换对 Append 的调用,但这也没有帮助。

压缩后的 XLSX 文件(有误,带有虚拟数据)可在此处获得:
https://drive.google.com/open?id=1KVVWEqH7VHMxwbRA-Pn807SXHZ32oJWR

完整数据表到 Excel XLSX 代码


    #Region " ToExcel "
    <Extension>
    Public Function ToExcel(ByVal target As DataTable) As Attachment
        Dim filename = Path.GetTempFileName()
        Using doc As SpreadsheetDocument = SpreadsheetDocument.Create(filename, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook)
            Dim data = New SheetData()

            Dim wbp = doc.AddWorkbookPart()
            wbp.Workbook = New Workbook()
            Dim wsp = wbp.AddNewPart(Of WorksheetPart)()
            wsp.Worksheet = New Worksheet(data)

            Dim sheets = wbp.Workbook.AppendChild(New Sheets())
            Dim sheet = New Sheet() With {.Id = wbp.GetIdOfPart(wsp), .SheetId = 1, .Name = "Data"}
            sheets.Append(sheet)

            data.AppendChild(ConstructHeaderRow(target))
            For Each dr As DataRow In target.Rows
                data.AppendChild(ConstructDataRow(dr)) '// THIS LINE YIELDS THE BAD PARTS
            Next

            wbp.Workbook.Save()
        End Using

        Dim attachmentname As String = Path.Combine(Path.GetDirectoryName(filename), $"data.{Now.ToString("yyyyMMdd.HHmmss")}.xlsx")
        File.Move(filename, attachmentname)
        Return New Attachment(attachmentname, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
    End Function

    Private Function ConstructHeaderRow(dt As DataTable) As Row
        Dim rv = New Row()
        For Each dc As DataColumn In dt.Columns
            rv.Append(ConstructCell(dc.ColumnName, CellValues.String))
        Next
        Return rv
    End Function

    Private Function ConstructDataRow(dr As DataRow) As Row
        Dim rv = New Row()
        For Each dc As DataColumn In dr.Table.Columns
            rv.Append(ConstructCell(NVL(dr(dc.Ordinal), String.Empty), MapSystemTypeToCellType(dc.DataType)))
        Next
        Return rv
    End Function

    Private Function ConstructCell(value As String, datatype As CellValues) As Cell
        Return New Cell() With {
        .CellValue = New CellValue(value),
        .DataType = datatype
        }
    End Function

    Private Function MapSystemTypeToCellType(t As System.Type) As CellValues
        Dim rv As CellValues
        Select Case True
            Case t Is GetType(String)
                rv = CellValues.String
            Case t Is GetType(Date)
                rv = CellValues.Date
            Case t Is GetType(Boolean)
                rv = CellValues.Boolean
            Case IsNumericType(t)
                rv = CellValues.Number
            Case Else
                rv = CellValues.String
        End Select

        Return rv
    End Function
    #End Region

对于任何进来发现这个的人,我最终追踪到了 Cell.DataType

设置值 CellValues.Date 将导致 Excel 想要 "fix" 文档。 (显然对于日期,the DataType should be NULL,而 Date 仅用于 Office 2010)。

此外,如果您指定 CellValues.Boolean 的 DataType,则 CellValue 需要为 0 或 1。"true" / "false" 也会导致 Excel想要 "fix" 您的电子表格。

此外,Microsoft 已经构建了一个更好的验证器工具供下载:
https://www.microsoft.com/en-us/download/details.aspx?id=30425