无法从页面检索元素

Question

我在尝试检索以下页面上的化学品的 IUPAC 名称时遇到问题：

https://echa.europa.eu/brief-profile/-/briefprofile/100.000.685

我只想将打印结果 return 作为本例中的 Benzene。

下面的代码提取所有类名为 `

的元素

Public Sub GetContents()
    
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    
    XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.000.685", False
    XMLReq.send
     
    HTMLDoc.body.innerHTML = XMLReq.responseText

    Set IUPACName = HTMLDoc.getElementsByClassName("col-sm-8")(0)
    
    Debug.Print IUPACName.innerText

End Sub

这个returns:

EC / List name: IUPAC name: benzene Substance names and other identifiers

检查页面似乎没有任何明显的标识符只是 return 苯。想知道人们会怎么做。

这是我要提取的文本的图像。

Answer 1

我无法在其他Office版本上测试，但2019，至少，你可以使用如下属性选择器：

Set IUPACName = HTMLDoc.querySelector("[title*=IUPAC]")
    
Debug.Print IUPACName.innerText

我期待使用：

Debug.Print IUPACName.NextSibling.NodeValue

因此，后一个可能是您 Office 版本所需要的。

mshtml.dll的世界瞬间就颠倒了

无法从页面检索元素

Cant retrieve element from page

excel

vba

web-scraping

getelementsbyclassname