使用绝对路径搜索 url 的网络浏览器模块
webbrowser module searching url with absolute path
我想打开一个网站从中下载简历,但是下面的代码试图获取绝对路径而不仅仅是 url:
import webbrowser
soup = BeautifulSoup(webbrowser.open('www.indeed.com/r/Prabhanshu-Pandit/dee64d1418e20069?sp=0'),"lxml")
生成以下错误:
gvfs-open: /home/utkarsh/Documents/Extract_Resume/www.indeed.com/r/Prabhanshu-
Pandit/dee64d1418e20069?sp=0:
error opening location: Error when getting information for file
'/home/utkarsh/Documents/Extract_Resume/www.indeed.com/r/Prabhanshu-
Pandit/dee64d1418e20069?sp=0': No such file or directory
很明显,它正在获取家庭住址并试图在网络上搜索该地址,但该地址不会出现。我在这里做错了什么?提前致谢
我想你混淆了 Beautiful Soup 和 webbrowser 的用法。网页浏览器不需要访问该页面。
Beautiful Soup provides a few simple methods and Pythonic idioms for
navigating, searching, and modifying a parse tree: a toolkit for
dissecting a document and extracting what you need. It doesn't take
much code to write an application
根据您的任务调整tutorial example以在输出中打印简历
from bs4 import BeautifulSoup
import requests
url = "www.indeed.com/r/Prabhanshu-Pandit/dee64d1418e20069?sp=0"
r = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
print soup.find("div", {"id": "resume"})
我想打开一个网站从中下载简历,但是下面的代码试图获取绝对路径而不仅仅是 url:
import webbrowser
soup = BeautifulSoup(webbrowser.open('www.indeed.com/r/Prabhanshu-Pandit/dee64d1418e20069?sp=0'),"lxml")
生成以下错误:
gvfs-open: /home/utkarsh/Documents/Extract_Resume/www.indeed.com/r/Prabhanshu-
Pandit/dee64d1418e20069?sp=0:
error opening location: Error when getting information for file
'/home/utkarsh/Documents/Extract_Resume/www.indeed.com/r/Prabhanshu-
Pandit/dee64d1418e20069?sp=0': No such file or directory
很明显,它正在获取家庭住址并试图在网络上搜索该地址,但该地址不会出现。我在这里做错了什么?提前致谢
我想你混淆了 Beautiful Soup 和 webbrowser 的用法。网页浏览器不需要访问该页面。
Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. It doesn't take much code to write an application
根据您的任务调整tutorial example以在输出中打印简历
from bs4 import BeautifulSoup
import requests
url = "www.indeed.com/r/Prabhanshu-Pandit/dee64d1418e20069?sp=0"
r = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
print soup.find("div", {"id": "resume"})