如何在使用 VBA 将 Excel table 导出到没有 BOM 的 .csv UTF8 时维护字符集?

How to maintain character set while exporting Excel table to .csv UTF8 without BOM using VBA?

我已经阅读了其他几个关于如何使用 UTF8 编码(无 BOM)将 table 导出到 .csv 的答案。我找到了几乎适合我的代码,见下文。

我的问题是 table 包含瑞典语字符 (ÅÄÖ),当打开 .csv 文件时,这些字符会丢失,看起来像是不正确的字符集。我找到了一个解决方法,即在记事本中打开 .csv 文件,保存,然后在 Excel 中打开它。解决方法使 Excel 正确显示字母,但我不希望有额外的步骤。能否修改下面的代码,使字符集不丢失?

Option Explicit

Sub CSVFileAsUTF8WithoutBOM()
Dim SrcRange As Range
Dim CurrRow As Range
Dim CurrCell As Range
Dim CurrTextStr As String
Dim ListSep As String
Dim FName As Variant
Dim UTFStream As Object
Dim BinaryStream As Object

' ADO Constants
Const adTypeBinary = 1 ' The stream contains binary data
Const adTypeText = 2 ' The stream contains text data (default)
Const adWriteLine = 1 ' write text string and a line separator (as defined by the LineSeparator property) to the stream.
Const adModeReadWrite = 3 ' Read/write
Const adLF = 10 ' Line feed only - default is carriage return line feed (adCRLF)
Const adSaveCreateOverWrite = 2 ' Overwrites the file with the data from the currently open Stream object, if the file already exists

' Open this workbook location
ChDrive Left(ThisWorkbook.Path, 1)
ChDir ThisWorkbook.Path

' ask for file name and path
  FName = Application.GetSaveAsFilename("", "CSV File (*.csv), *.csv")

' prepare UTF-8 stream
  Set UTFStream = CreateObject("adodb.stream")
  UTFStream.Type = adTypeText
  UTFStream.Mode = adModeReadWrite
  UTFStream.Charset = "UTF-8"
  UTFStream.LineSeparator = adLF
  UTFStream.Open

  'set field separator
  ListSep = ";"
  'set source range with data for csv file
  If Selection.Cells.Count > 1 Then
    Set SrcRange = Selection
  Else
    Set SrcRange = ActiveSheet.UsedRange
  End If

  For Each CurrRow In SrcRange.Rows
    CurrTextStr = ""
    For Each CurrCell In CurrRow.Cells
      CurrTextStr = CurrTextStr & Replace(CurrCell.Value, """", """""") & ListSep
    Next
    'remove ListSep after the last value in line
    While Right(CurrTextStr, 1) = ListSep
      CurrTextStr = Left(CurrTextStr, Len(CurrTextStr) - 1)
    Wend
    'add line to UTFStream
    UTFStream.WriteText CurrTextStr, adWriteLine ' Writes character data to a text Stream object
  Next

  'skip BOM
  UTFStream.Position = 3 ' sets or returns a long value that indicates the current position (in bytes) from the beginning of a Stream object

  'copy UTFStream to BinaryStream
  Set BinaryStream = CreateObject("adodb.stream")
  BinaryStream.Type = adTypeBinary
  BinaryStream.Mode = adModeReadWrite
  BinaryStream.Open ' Opens a Stream object

  'Strips BOM (first 3 bytes)
  UTFStream.CopyTo BinaryStream ' Copies a specified number of characters/bytes from one Stream object into another Stream object

  UTFStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
  UTFStream.Close ' Closes a Stream object

  'save to file
  BinaryStream.SaveToFile FName, adSaveCreateOverWrite
  BinaryStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
  BinaryStream.Close ' Closes a Stream object

End Sub

编辑:

根据您的评论,我了解到您最初想要的是在文件中保留有关字符编码的信息,而无需 BOM

这个问题的问题(正如您意识到的那样)是 BOM 实际上通常包含有关字符编码的信息,将此信息放在文件中的其他任何地方并没有真正意义。

因此,您的代码实际上非常适合手头的任务。需要更改的是 CSV 文件如何被您要使用的软件imported/opened。

When the file has no BOM, a software reading the file has to guess the character encoding.

一般来说,如果你使用的软件不支持BOM并且猜错了,至少应该有一种方法可以自定义import/open命令的行为,这样你就可以指定字符编码(好像你真的找到了)。

原回答:

由于某些原因,当您只是 double-clicking 打开 UTF-8 编码的 CSV 文件时,Excel 很难猜测字符编码。你要帮它一点...

如果Excel自己想不通

如果你在做的时候录制一个宏,然后把它做成一个子程序,你可能会有这样的东西:

Sub OpenCSV(FullFileName As String)

    Dim wb As Workbook
    Set wb = Workbooks.Add
    
    Dim ws As Worksheet
    Set ws = wb.Sheets(1)

    With ws.QueryTables.Add(Connection:= _
        "TEXT;" & FullFileName, Destination:=Range( _
        "$A"))
        .Name = "CSV_Open"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .TextFilePromptOnRefresh = False
        .TextFilePlatform = 65001
        .TextFileStartRow = 1
        .TextFileParseType = xlDelimited
        .TextFileTextQualifier = xlTextQualifierDoubleQuote
        .TextFileConsecutiveDelimiter = False
        .TextFileTabDelimiter = True
        .TextFileSemicolonDelimiter = False
        .TextFileCommaDelimiter = False
        .TextFileSpaceDelimiter = False
        .TextFileColumnDataTypes = Array(1)
        .TextFileTrailingMinusNumbers = True
        .Refresh BackgroundQuery:=False
    End With
    
End Sub

其他建议

如果您确实希望能够 double-click 文件而不是使用文本导入向导或 运行 宏,您总是可以在add-in 或 PERSONAL.XSLB 运行 每次打开工作簿时。

如果它检测到刚刚打开的文件是 CSV 文件,它可以关闭它并使用上面的代码“重新打开”它。

额外: 有趣的是:有一个问题 here 关于如何更改 Excel 使用的默认字符编码。