Ttitle <H1> 在通过 VBA 抓取时以与所需不同的编码返回
Ttitle <H1> returned in different encoding than needed when scraped through VBA
我正在使用以下函数通过 VBA 抓取网站的标题,但是,我对某些字符(例如破折号)的编码感到困惑,因为这些字符返回为例如部长理事会公报 – 科特迪瓦共和国总统府 d';科特迪瓦。我正在获取的标题可以在这里找到:https://www.presidence.ci/communiques-du-conseil-des-ministres/。有没有办法通过改变现有功能来解决这个问题?
Function fgetMetaTitle(ByVal strURL) As String
Dim stPnt As Long, x As String
Dim oXH As Object
'Get URL's HTML Source
Set oXH = CreateObject("msxml2.xmlhttp")
With oXH
.Open "get", strURL, False
.send
x = .responseText
End With
Set oXH = Nothing
'Parse HTML Source for Title
If InStr(1, UCase(x), "<TITLE>") Then
stPnt = InStr(1, UCase(x), "<TITLE>") + Len("<TITLE>")
fgetMetaTitle = Mid(x, stPnt, InStr(stPnt, UCase(x), "</TITLE>") - stPnt)
Else
fgetMetaTitle = ""
End If
End Function
这样的事情怎么样(我添加了破折号和撇号的处理程序):
Function fgetMetaTitle(ByVal strURL) As String
Dim stPnt As Long, x As String
Dim oXH As Object
'Get URL's HTML Source
Set oXH = CreateObject("msxml2.xmlhttp")
With oXH
.Open "get", strURL, False
.send
x = .responseText
End With
Set oXH = Nothing
'Parse HTML Source for Title
If InStr(1, UCase(x), "<TITLE>") Then
stPnt = InStr(1, UCase(x), "<TITLE>") + Len("<TITLE>")
fgetMetaTitle = Mid(x, stPnt, InStr(stPnt, UCase(x), "</TITLE>") - stPnt)
'This will handle the apostrophe:
fgetMetaTitle = Replace(fgetMetaTitle, "'", "'")
'This will handle the dash:
fgetMetaTitle = Replace(fgetMetaTitle, "–", "-")
Else
fgetMetaTitle = ""
End If
End Function
同时检查一下:HTML Entities
我正在使用以下函数通过 VBA 抓取网站的标题,但是,我对某些字符(例如破折号)的编码感到困惑,因为这些字符返回为例如部长理事会公报 – 科特迪瓦共和国总统府 d';科特迪瓦。我正在获取的标题可以在这里找到:https://www.presidence.ci/communiques-du-conseil-des-ministres/。有没有办法通过改变现有功能来解决这个问题?
Function fgetMetaTitle(ByVal strURL) As String
Dim stPnt As Long, x As String
Dim oXH As Object
'Get URL's HTML Source
Set oXH = CreateObject("msxml2.xmlhttp")
With oXH
.Open "get", strURL, False
.send
x = .responseText
End With
Set oXH = Nothing
'Parse HTML Source for Title
If InStr(1, UCase(x), "<TITLE>") Then
stPnt = InStr(1, UCase(x), "<TITLE>") + Len("<TITLE>")
fgetMetaTitle = Mid(x, stPnt, InStr(stPnt, UCase(x), "</TITLE>") - stPnt)
Else
fgetMetaTitle = ""
End If
End Function
这样的事情怎么样(我添加了破折号和撇号的处理程序):
Function fgetMetaTitle(ByVal strURL) As String
Dim stPnt As Long, x As String
Dim oXH As Object
'Get URL's HTML Source
Set oXH = CreateObject("msxml2.xmlhttp")
With oXH
.Open "get", strURL, False
.send
x = .responseText
End With
Set oXH = Nothing
'Parse HTML Source for Title
If InStr(1, UCase(x), "<TITLE>") Then
stPnt = InStr(1, UCase(x), "<TITLE>") + Len("<TITLE>")
fgetMetaTitle = Mid(x, stPnt, InStr(stPnt, UCase(x), "</TITLE>") - stPnt)
'This will handle the apostrophe:
fgetMetaTitle = Replace(fgetMetaTitle, "'", "'")
'This will handle the dash:
fgetMetaTitle = Replace(fgetMetaTitle, "–", "-")
Else
fgetMetaTitle = ""
End If
End Function
同时检查一下:HTML Entities