Python:HTTPConnectionPool(主机=“%s”,端口=80):
Python: HTTPConnectionPool(host='%s', port=80):
import requests
import urllib3
from time import sleep
from sys import argv
script, filename = argv
http = urllib3.PoolManager()
datafile = open('datafile.txt','w')
crawl = ""
with open(filename) as f:
mylist = f.read().splitlines()
def crawlling(x):
for i in mylist:
domain = ("http://" + "%s") % i
crawl = http.request('GET','%s',preload_content=False) % domain
for crawl in crawl.stream(32):
print crawl
sleep(10)
crawl.release_conn()
datafile.write(crawl.status)
datafile.write('>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n')
datafile.write(crawl.data)
datafile.close()
return x
crawlling(crawl)
_______________________________________________________________________
Extract of domain.txt file:
fjarorojo.info
buscadordeproductos.com
我是 python 的新手,请多多包涵:我正在尝试从 URL 获取内容,但出现错误。此外,它在浏览器中运行良好。
脚本的对象是从 domain.txt 文件中获取数据并对其进行迭代并获取内容并将其保存在文件中。
Getting this error:
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='%s',
port=80): Max retries exceeded with url: / (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7ff45e4f9cd0>: Failed to establish a new connection: [Errno -2] Name or
service not known',))
这一行是问题所在:
crawl = http.request('GET','%s',preload_content=False) % domain
现在您正在尝试向域 %s
发出请求,该域不是有效域,因此出现错误“名称或
服务未知”。
应该是:
crawl = http.request('GET', '%s' % domain, preload_content=False)
或者更简单地说:
crawl = http.request('GET', domain, preload_content=False)
此外,与您发布的错误无关,这些行也可能会导致问题:
for crawl in crawl.stream(32):
print crawl
sleep(10)
crawl.release_conn() # <--
您在循环中释放连接,因此循环将无法在第二次迭代中产生预期结果。相反,您应该只在完成请求后释放连接。 More details here.
import requests
import urllib3
from time import sleep
from sys import argv
script, filename = argv
http = urllib3.PoolManager()
datafile = open('datafile.txt','w')
crawl = ""
with open(filename) as f:
mylist = f.read().splitlines()
def crawlling(x):
for i in mylist:
domain = ("http://" + "%s") % i
crawl = http.request('GET','%s',preload_content=False) % domain
for crawl in crawl.stream(32):
print crawl
sleep(10)
crawl.release_conn()
datafile.write(crawl.status)
datafile.write('>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n')
datafile.write(crawl.data)
datafile.close()
return x
crawlling(crawl)
_______________________________________________________________________
Extract of domain.txt file:
fjarorojo.info
buscadordeproductos.com
我是 python 的新手,请多多包涵:我正在尝试从 URL 获取内容,但出现错误。此外,它在浏览器中运行良好。 脚本的对象是从 domain.txt 文件中获取数据并对其进行迭代并获取内容并将其保存在文件中。
Getting this error:
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='%s',
port=80): Max retries exceeded with url: / (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7ff45e4f9cd0>: Failed to establish a new connection: [Errno -2] Name or
service not known',))
这一行是问题所在:
crawl = http.request('GET','%s',preload_content=False) % domain
现在您正在尝试向域 %s
发出请求,该域不是有效域,因此出现错误“名称或
服务未知”。
应该是:
crawl = http.request('GET', '%s' % domain, preload_content=False)
或者更简单地说:
crawl = http.request('GET', domain, preload_content=False)
此外,与您发布的错误无关,这些行也可能会导致问题:
for crawl in crawl.stream(32):
print crawl
sleep(10)
crawl.release_conn() # <--
您在循环中释放连接,因此循环将无法在第二次迭代中产生预期结果。相反,您应该只在完成请求后释放连接。 More details here.