使用 requests_html 进行网络抓取，但它说缺少铬文件

Question

我尝试使用请求 scrape html 但它 returns 一个错误说缺少文件，即使我 pip 安装请求- html 它说所有要求都已满足。我该如何解决这个问题。

from requests_html import HTMLSession
import time

url = 'https://soundcloud.com/jujubucks'

s = HTMLSession()
r = s.get(url)

r.html.render()

songs = r.html.xpath('//*[@id="content"]/div/div[4]/div[1]/div/div[2]/div/div[2]', first=True)

print(songs)

这会产生 sxstrace 错误。

OSError: [WinError 14001] The application has failed to start because its side-by-side 
configuration is incorrect. Please see the application event log or use the command-line 
sxstrace.exe tool for more detail

根据事件日志，显然这是丢失的文件，但我不知道从哪里得到它。

“C:\Users\houst\AppData\Local\pyppeteer\pyppeteer\local-chromium8429\chrome-win32\chrome.exe”的激活上下文生成失败。找不到从属程序集 71.0.3542.0,language="*",type="win32",version="71.0.3542.0"。请使用sxstrace.exe进行详细诊断。

Answer 1

requests_html 取决于 pyppeteer 但您的 pypeteer 似乎没有完全安装铬。尝试手动安装 chromium，只需激活包含 pyppeteer 和运行 pyppeteer-install.exe.

的环境

Answer 2

我带着同样的问题来到这里，但唯一的答案并不适用于我。我的 win10x64 PC 有 5 个版本的 python，4 个通过 anaconda 安装，python 3.10 通过微软商店安装。使用 MS store 版本调试 vscode 中的进程...仅为 python 版本安装 pip install requests-html。

VScode 堆栈跟踪显示 subprocess.py 无法启动子进程。 Windows 事件查看器显示启动 chrome.exe 的尝试失败： C:\Users\username\AppData\Local\pyppeteer\pyppeteer\local-chromium8429\chrome-win32

Windows 搜索显示 chrome.exe - 在第一次尝试调用 response.html.render() 时自动下载并提取 - 实际上位于： C:\Users\username\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\Local\pyppeteer\pyppeteer\local-chromium8429\chrome-win32

作为解决方法，虽然我不知道为什么会出现问题，但我将 chrome-win32 目录移到了预期的位置，并发现 chrome 运行页面上的 javascript 并正确返回 html。

使用 requests_html 进行网络抓取，但它说缺少铬文件

Webscraping with requests_html but it says a chromium file is missing

python

chromium

web-scraping

pyppeteer

python-requests-html