支持 utf16 与 ms xml 6.0

Support for utf16 with ms xml 6.0

我正在从一个法语站点抓取数据。我使用的是 MS XML 6.0,有些字母没有被正确识别 (例如é)

代码:

Dim xml_obj As XMLHTTP
Set xml_obj = New XMLHTTP
xml_obj.Open "GET", "http://www.emploi.nat.tn/fo/Fr/global.php?page=146&menu1=&FormLinks_Sorting=1&FormLinks_Sorted=&num_page=0&limit=500&numpage=1", False
xml_obj.send
Dim htmldoc As New HTMLDocument
htmldoc.body.innerHTML = xml_obj.responseText

responseText 以 UTF-8 编码。任何解决方法?

由于编码是 windows-1256,您首先需要解码页面。然后直接在文档中写 html 而不是在正文中:

Sub UsageExample()

    Dim req As New MSXML2.ServerXMLHTTP60  ' Microsoft XML, v6.0 '
    req.Open "GET", "http://www.emploi.nat.tn/fo/Fr/global.php?page=146&menu1=&FormLinks_Sorting=1&FormLinks_Sorted=&num_page=0&limit=500&numpage=1", False
    req.Send

    Dim doc As New MSHTML.HTMLDocument     ' Microsoft HTML Object Library '
    WriteDocument doc, req.responseBody, "windows-1256"

End Sub

Private Sub WriteDocument(document As Object, data, charset As String)
    Dim stream As New ADODB.stream   ' Microsoft ActiveX Data Objects 6.1 Library '
    stream.Open
    stream.Type = 1
    stream.Write data
    stream.Position = 0
    stream.Type = 2
    stream.charset = charset

    document.Open
    document.Write stream.ReadText
    document.Close

    stream.Close
End Sub