浏览到页面后发送 HTTP 请求作为 Web 抓取的单独请求(Angular 站点)

Send HTTP Requests after browsing to a page as separate requests for Web Crawling (Angular site)

我公司最近升级到新版本的 iManage(一个文件归档系统),它不再有库暴露给 VBA。由于公司政策,我可以 运行 VBA 但不能创建 VSTO/.NET 插件。

我正在尝试修复一个插件工具,该工具可以清点 folder/subfolders 中的所有项目。

我目前正在探索的解决方案是导航到新 Web 门户中的文件夹,然后从那里清点。我可能可以进行经典的网络抓取并点击浏览器中的链接,但这会很慢而且非常难看。由于它是一个 Angular 应用程序,我认为我应该能够在不等待页面加载的情况下触发 REST 请求并解析响应。

我遇到 InvalidToken 返回失败的问题。

{
  "error": {
    "code": "InvalidToken",
    "message": "X-Auth-Token is invalid or missing"
  }
}

当前的解决方案是在 Excel VBA 的用户窗体中创建一个 WebBrowser 对象。此用户表单导航到我们的 iManage 门户。然后我可以浏览该站点并单击一个按钮来启动请求。

Private Sub CommandButton1_Click()
    Debug.Print WebBrowser1.Busy
    
    Dim Doc As HTMLDocument
    Set Doc = WebBrowser1.Document
    Debug.Print Doc.cookie

    Dim Request As New WinHttpRequest
    Request.Open "GET", Url:="https://imanage.xxxx.com/work/web/api/v2/customers/1/libraries/CLIENT-JOB/tabs/CLIENT-JOB!9975487/children?limit=500&offset=0&total=true", ASync:=False
    Request.setRequestHeader "Content-Type", "application/json"
    Request.setRequestHeader "Accept-Encoding", "gzip, deflate, br"
    Request.setRequestHeader "Accept-Language", "en-GB,en-US;q=0.9,en;q=0.8"
    Request.setRequestHeader "Connection", "keep-alive"
    Request.setRequestHeader "Host", Doc.Location.host
    Request.setRequestHeader "Referer", Doc.Location.href
    'Request.setRequestHeader "Cookie", WebBrowser1.Document.cookie
    Request.setRequestHeader "Set-Cookie", WebBrowser1.Document.cookie
    Request.setRequestHeader "X-XSRF-TOKEN", Split(Split(WebBrowser1.Document.cookie, ";")(2), "=")(1)
    Request.send

    Dim Result As String
    Result = Request.responseText
    Debug.Print Result
    
End Sub

Private Sub UserForm_Initialize()
     WebBrowser1.Navigate2 "https://imanage.XXXXX.com/work/web/r/custom2/recent-custom2?exclude_emails=true&scope=Admin,AdminArchive,Client-Job,JobArchive&p=1"
End Sub

我觉得这是在复制我在 Chrome 中看到的请求调用。

我认为问题的很大一部分是我在 WebBrowser 中看到的 HTMLDocument 从未列出我在 Chrome.

中看到的所有相同 cookie

无法使用 document.cookie

检索最后一张屏幕截图中标记为“HttpOnly”的 Cookie

https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies

A cookie with the HttpOnly attribute is inaccessible to the JavaScript Document.cookie API; it is sent only to the server. For example, cookies that persist server-side sessions don't need to be available to JavaScript, and should have the HttpOnly attribute. This precaution helps mitigate cross-site scripting (XSS) attacks.

也许你可以试试: