如何使用 queryselectorall 从网站抓取特定数据到 Excel

How to grab particular data from a website to Excel with queryselectorall

我正在尝试将数据从网站传输到 Excel:

通过我的代码,我得到了 COL1 的所有值:
10117 柏林 AND
Feydelstr AND Entfernung Nur für Kunden sichtbar

但是我如何获得访问权限

我的代码:


Public Sub GrGHTML()

    Const url = "https://www...."
    Dim Html As New HTMLDocument, HTMLDoc As New HTMLDocument
    Dim elm As Object
    Dim x As Long
   
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", url, False
        .send
        Html.body.innerHTML = .responseText
    End With
       
        Set elm = Html.querySelectorAll("div.col1[class]")
        For x = 0 To elm.Length - 1
        ActiveSheet.Cells(x + 2, 2) = elm.Item(x).innerText
        Next
End Sub

我认为您需要对 objectid、plz-ort、vw_ab、objektart、 vw_bis,umreis - 不是 GET。

问题:

just to the first two values of COL1? (without the third value Entfernung: Nur für Kunden sichtbar)

答案: 选择一个子 div 元素,这样您就不会捕获该附加行。 .col1 div


问题:

how to the Pic in COL2: <img src="/immobilien...musterfoto.jpg>

答案: 以同样的方式,您 select 通过 class 名称编辑“第 1 列”,select “第 2 列”和然后获取子 img 元素 .col2 img,并提取 src 属性


问题:

and to the link in <a class="entry clearfix" href="/home/fuer_priv...

答案:a标签元素带classentrya.entry.


一般:

这些提供匹配长度的节点列表,因此您只需要循环一个列表并在循环期间索引到其他列表。

使用更有意义的变量名。

通过将 about: 替换为协议 + 域来完成 uris

在 post

中更正您的 url

VBA:

Option Explicit

Public Sub GrGHTML()
    Const URL = "https://www.argetra.de/home/fuer_privat/immobilien-suche~ae23f6bb38cb10bf01399d6fef892037.de.html?plz_ort=Berlin"
    Dim html As MSHTML.HTMLDocument
   
    Set html = New MSHTML.HTMLDocument
    
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .send
        html.body.innerHTML = .responseText
    End With
       
    Dim locations As Object, images As Object, links As Object
    
    Set locations = html.querySelectorAll(".col1 div")
    Set images = html.querySelectorAll(".col2 img")
    Set links = html.querySelectorAll("a.entry")
    
    With ActiveSheet
    
        Dim x As Long
        
        For x = 0 To locations.Length - 1
            .Cells(x + 2, 2) = locations.Item(x).innerText
            .Cells(x + 2, 3) = Replace$(images.Item(x).src, "about:", "https://www.argetra.de") 'Image
            .Cells(x + 2, 4) = Replace$(links.Item(x).href, "about:", "https://www.argetra.de") 'Links
        Next
    End With
End Sub