使用绝对路径搜索 url 的网络浏览器模块

Question

我想打开一个网站从中下载简历，但是下面的代码试图获取绝对路径而不仅仅是 url:

import webbrowser
soup = BeautifulSoup(webbrowser.open('www.indeed.com/r/Prabhanshu-Pandit/dee64d1418e20069?sp=0'),"lxml")

生成以下错误：

gvfs-open: /home/utkarsh/Documents/Extract_Resume/www.indeed.com/r/Prabhanshu-
Pandit/dee64d1418e20069?sp=0: 
error opening location: Error when getting information for file 
'/home/utkarsh/Documents/Extract_Resume/www.indeed.com/r/Prabhanshu-
Pandit/dee64d1418e20069?sp=0': No such file or directory

很明显，它正在获取家庭住址并试图在网络上搜索该地址，但该地址不会出现。我在这里做错了什么？提前致谢

Answer 1

我想你混淆了 Beautiful Soup 和 webbrowser 的用法。网页浏览器不需要访问该页面。

来自Documentation

Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. It doesn't take much code to write an application

根据您的任务调整tutorial example以在输出中打印简历

from bs4 import BeautifulSoup
import requests
url = "www.indeed.com/r/Prabhanshu-Pandit/dee64d1418e20069?sp=0"
r  = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
print soup.find("div", {"id": "resume"})

使用绝对路径搜索 url 的网络浏览器模块

webbrowser module searching url with absolute path

python

beautifulsoup

python-webbrowser