Getting an error: name 'html' is not defined while trying to implement simple program for HTTP request response cycle using urllib library in python

Question

我正在 python 中学习 BeautifulSoup 库并遇到 urllib 库以了解更多关于 HTTP 请求-响应周期的信息。

在下面的代码中，我试图抓取该 HTML 页面上的所有锚标记，但出现错误：NameError: name 'html'未定义

我尝试使用 google 解决问题，发现了以下相关的 Whosebug 问题： .

我尝试了给定的解决方案，但它不起作用。

import urllib
from bs4 import BeautifulSoup
url=input('Enter- ')
req_file=urllib.request.urlopen(url).read()
soup=BeautifulSoup(html,"html.parser")
tags=soup('a')
for tag in tags:
    print(tag.get('href',None))

Answer 1

您将读取存储为变量 reg_file:

req_file=urllib.request.urlopen(url).read()

但是当你试图将它传递给 BeautifulSoup 时，它正在寻找变量 html，它没有被定义为任何东西，因此 'html' is not defined 错误

soup=BeautifulSoup(html,"html.parser")

所以选项是，将 request .read() 存储为变量 html:

html=urllib.request.urlopen(url).read()
soup=BeautifulSoup(html,"html.parser")

或将您最初存储的内容 req_file 传递给 BeautifulSoup:

req_file=urllib.request.urlopen(url).read()
soup=BeautifulSoup(req_file,"html.parser")

希望解释有所帮助。我还在学习BeautifulSoup，但还记得当初所有的挣扎。一旦掌握了它就会很有趣。

import urllib
from bs4 import BeautifulSoup
url=input('Enter- ')
req_file=urllib.request.urlopen(url).read()
soup=BeautifulSoup(req_file,"html.parser")
tags=soup('a')
for tag in tags:
    print(tag.get('href',None))

Getting an error: name 'html' is not defined while trying to implement simple program for HTTP request response cycle using urllib library in python

Getting an error: name 'html' is not defined while trying to implement simple program for HTTP request response cycle using urllib library in python

beautifulsoup

html-parser

python-3.x