为什么当我在 str 中传递一个参数时,这将是字节
why when i pass a parameter in str this will be byte
我有一个接收 str 值的函数,但是当我执行错误时说这是一个字节值:
Traceback (most recent call last):
File "C:\Users\sdand\Documents\Python\Engine\engine.py", line 4, in <module>
print (find.crawl_web('https://google.com',4))
File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 68, in crawl_web
links = self.get_all_links(content)
File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 20, in get_all_links
url, endpos = self.get_next_target(page)
File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 7, in get_next_target
start_link = s.find('<a href=')
TypeError: a bytes-like object is required, not 'str'
这是我调用的函数 get_all_links:
def crawl_web(self,seed, max_depth):
tocrawl = [seed]
crawled = []
next_depth = []
depth = 0
index=[]
while tocrawl and depth <= max_depth:
page = tocrawl.pop()
if page not in crawled:
#here content content is str
content = self.get_page(page)
self.add_page_to_index(index,page,content)
links = self.get_all_links(content)
self.union(next_depth,links)
crawled.append(page)
if not tocrawl:
tocrawl, next_depth = next_depth, []
depth = depth+1
return index
这是get_page:
def get_page(self,url):
try:
import urllib.request
return urllib.request.urlopen(url).read()
except:
return ""
这是get_all_links:
def get_all_links(self,page):
#but here it is byte i dont now why
links=[]
while True:
url, endpos = self.get_next_target(page)
print(url)
if url != None:
links.append(url)
page = page[endpos:]
else:
break
return links
我现在不知道为什么我的 str 变量 "Content" 在 get_all_links 中转换为字节类型,有人可以向我解释一下,我该如何解决?
您可能不知道,.read()
returns 是 byte
对象,而不是 str
,尽管在 web 时更推荐使用 byte
对象抓取,最简单的解决方法是通过解码将其转换为 str
。
return urllib.request.urlopen(url).read().decode('utf-8')
我有一个接收 str 值的函数,但是当我执行错误时说这是一个字节值:
Traceback (most recent call last):
File "C:\Users\sdand\Documents\Python\Engine\engine.py", line 4, in <module>
print (find.crawl_web('https://google.com',4))
File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 68, in crawl_web
links = self.get_all_links(content)
File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 20, in get_all_links
url, endpos = self.get_next_target(page)
File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 7, in get_next_target
start_link = s.find('<a href=')
TypeError: a bytes-like object is required, not 'str'
这是我调用的函数 get_all_links:
def crawl_web(self,seed, max_depth):
tocrawl = [seed]
crawled = []
next_depth = []
depth = 0
index=[]
while tocrawl and depth <= max_depth:
page = tocrawl.pop()
if page not in crawled:
#here content content is str
content = self.get_page(page)
self.add_page_to_index(index,page,content)
links = self.get_all_links(content)
self.union(next_depth,links)
crawled.append(page)
if not tocrawl:
tocrawl, next_depth = next_depth, []
depth = depth+1
return index
这是get_page:
def get_page(self,url):
try:
import urllib.request
return urllib.request.urlopen(url).read()
except:
return ""
这是get_all_links:
def get_all_links(self,page):
#but here it is byte i dont now why
links=[]
while True:
url, endpos = self.get_next_target(page)
print(url)
if url != None:
links.append(url)
page = page[endpos:]
else:
break
return links
我现在不知道为什么我的 str 变量 "Content" 在 get_all_links 中转换为字节类型,有人可以向我解释一下,我该如何解决?
您可能不知道,.read()
returns 是 byte
对象,而不是 str
,尽管在 web 时更推荐使用 byte
对象抓取,最简单的解决方法是通过解码将其转换为 str
。
return urllib.request.urlopen(url).read().decode('utf-8')