为什么从 ANSI 转换为 UTF-8 后,XML 中仍会出现未知字符?

Why unknown character appear in XML even after convert from ANSI to UTF-8?

我有一个问题。目前,我研究了如何将 .xml 文件的编码从 ANSI 转换为 UTF-8,幸运的是我找到了解决方案。但是有一个问题。我的 .xml 文件包含许多西班牙语,当然还有许多倒问号符号。为了让 eclipse 能够完美显示 .xml 文件中的所有字符,我需要将 .xml 文件的编码从 ANSI 更改为 UTF-8。我设法改变了编码。但这很奇怪。即使更改编码后,包含本身也会显示未知字符。下面是我的代码:

Dim objFso, objF As Object
Set objFso = CreateObject("Scripting.FileSystemObject")
xmlFile = NewFolderName & "\" & Application.Cells(5, j + 1).Value
            Set objF = objFso.CreateTextFile(xmlFile, True, False)
            objF.Write "<resources>"
            objF.WriteBlankLines (1)
            i = 11
            Var = Application.Cells(8, j + 1).Value
            Do Until Application.Cells(i, 2).Value = 0
                    objF.Write "     <string name=" & Chr(34) & Application.Cells(i, 2).Value & Var & Chr(34) & ">" & Application.Cells(i, j + 1).Value & "</string>"
                    objF.WriteBlankLines (1)
                i = i + 1
            Loop
            objF.WriteBlankLines (1)
            objF.Write ("</resources>")
            objF.Close
            Set stream = CreateObject("ADODB.Stream")
            stream.Open
            stream.Type = 2
            stream.Charset = "utf-8"
            stream.LoadFromFile xmlFile
            stream.SaveToFile xmlFile, 2
            stream.Close
            Set stream = Nothing

以上代码的输出如下所示:

<string name="BroadcastFailed">No se recibi� emisi�n [E202]</string>
<string name="NoInputSelect">No hay selecci�n de entrada disponible</string>
<string name="ThreeDModeQ">�Ver en Modo 3D?</string>

以上 .xml 输出以 UTF-8 编码,但仍然出现未知字符。我想要的是这样的:

<string name="BroadcastFailed">No se recibió emisión [E202]</string>
<string name="NoInputSelect">No hay selección de entrada disponible</string>
<string name="ThreeDModeQ">¿Ver en Modo 3D?</string>

任何知道我的代码中有什么错误的人,请下拉您的答案。非常感谢您的回答:):)

问题是您将初始文件保存为 ASCII(您将 CreateTextFile()Unicode 参数设置为 False)。根据 documentation:

object.CreateTextFile(filename[, overwrite[, unicode]])

The CreateTextFile method has these parts:

Part Description

object Required. Always the name of a FileSystemObject or Folder object.

filename Required. String expression that identifies the file to create.

overwrite Optional. Boolean value that indicates if an existing file can be overwritten. The value is True if the file can be overwritten; False if it can't be overwritten. If omitted, existing files are not overwritten.

unicode Optional. Boolean value that indicates whether the file is created as a Unicode or ASCII file. The value is True if the file is created as a Unicode file; False if it's created as an ASCII file. If omitted, an ASCII file is assumed.

您随后将 ASCII 文件加载为 UTF-8。这对于 ASCII 字符来说很好(因为 ASCII 是 UTF-8 的子集),但是您正在丢失非 ASCII 字符,例如 ó¿。这就是为什么您在最终文件中以 (Unicode 代码点 U+FFFD REPLACEMENT CHARACTER)字符结束。

您需要将初始文件保存为Unicode,然后将其作为Unicode加载到ADODB.Stream中,这样您就不会丢失任何字符,然后可以将文本保存为您想要的任何字符集:

Set objF = objFso.CreateTextFile(xmlFile, True, True) ' Unicode parameter is True
' ...
Set stream = CreateObject("ADODB.Stream")
stream.Type = 2
stream.Charset = "utf-16"
stream.Open
stream.LoadFromFile xmlFile ' load as Unicode
stream.Charset = "utf-8"
stream.SaveToFile xmlFile, 2 ' save as UTF-8
stream.Close

经过一番研究,终于找到了解决办法。我需要使用 LoadFromFile 加载我的 unicode 文件,并使用 stream.ReadText 使流读取内容,然后先关闭它。然后我需要重新打开流,使用 stream.WriteText 将内容写回到 utf-8 并使用 SaveToFile 保存它,然后永久关闭它。下面是代码。实际上我从 Use "ADODB.Stream" to convert ANSI to UTF-8, miss 1-2 character in the first row 得到了参考。

Set stream = CreateObject("ADODB.Stream")
            stream.Type = 2
            stream.Charset = "unicode"
            stream.Open
            stream.LoadFromFile xmlFile
            strText = stream.ReadText
            stream.Close

            stream.Type = 2
            stream.Charset = "utf-8"
            stream.Open
            stream.WriteText strText
            stream.SaveToFile xmlFile, 2
            stream.Close
            Set stream = Nothing