从 HTML table 中提取文本

Extracting text from HTML table

我正在尝试从此页面中提取各种元素:

http://partsurfer.hp.com/Search.aspx?searchText=4CE0460D0G

我想从 ctl00_BodyContentPlaceHolder_lblSerialNumber 开始。

如果您知道 ID,肯定有一个简单的解决方案可以从 HTML 页面中提取您想要的元素?我认为 getElementsByNamegetElementById 甚至 getElementsByTagName 之类的东西会起作用,但我无法让它提取我想要的东西,尽我所能!

这行不通:

 Function GetHPModelName()

     Dim ie As Object
        Dim Oelement As Object
        Dim Ohtml As New MSHTML.HTMLDocument
        Dim lrow As Integer

        With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://partsurfer.hp.com/Search.aspx?searchText=" & Worksheets("HP_Lookup").Range("A2").Value, False
        .send
        Ohtml.body.innerHTML = .responseText

        End With


    FetchHPInfo "ctl00_BodyContentPlaceHolder_lblSerialNumber", "A", Oelement, Ohtml 
End Function

通话中

Public Function FetchHPInfo(tablename As String, thiscolumn As String, Oelement As Object, Ohtml As MSHTML.HTMLDocument)
lrow = 1
For Each Oelement In Ohtml.getElementsById(tablename)
    Worksheets("HP_main").Range(thiscolumn & lrow).Value = Oelement.innerText
    lrow = lrow + 1
    Next Oelement
    Worksheets("HP_main").Columns(thiscolumn).cells.HorizontalAlignment = xlHAlignLeft
    Worksheets("HP_main").Columns(thiscolumn).AutoFit
End Function

getElementById() 应该是您所需要的,因为该节点具有 ID 属性。您可能会遇到问题,因为您正在尝试将 responseText 分配给文档正文,但文档还没有 <body> 节点。只需使用 write() 将整个响应写入空文档即可。这是我拼凑的一个例子 returns 正确的值:

Dim objHttp
Set objHttp = CreateObject("MSXML2.XMLHTTP")
objHttp.Open "GET", "http://partsurfer.hp.com/Search.aspx?searchText=4CE0460D0G", False
objHttp.Send

Dim doc
Set doc = CreateObject("htmlfile")
doc.write objHttp.responseText

MsgBox doc.getElementById("ctl00_BodyContentPlaceHolder_lblSerialNumber").innerText

输出:

4CE0460D0G