使用 python 请求的一个连接请求某些站点上的多个页面

Question

我想从一个站点解析多个（大约 180 个）页面。在 python 我是这样做的：

def myFunc(pages):
forreturn=[]
session=requests.session()
for page in pages:    #List, containing page addresses
    url = 'http://example.com/' + page 
    # we get something like 'http://example.com/sub1/page.html'
    # Ant the part with "sub1" is different each time.
    answer = session.get(url)
    soup=Soup(answer.text)
    # There we parse needed string and append it to "forreturn" list
return forreturn

据我了解，这样做时，我会在请求新页面时打开与服务器的新连接。那么有没有一种方法可以只使用一个连接来获取所有这些页面？

（如我所想，它可能会改善获取响应时间并且服务器压力会降低）

Answer 1

在 HTTP 1.0 下，你可以更明确地做到这一点。

session.get(url, headers={'Connection': 'Keep-Alive'})
在 HTTP 1.1 中，除非另有声明，否则所有连接都被视为持久连接。
正如@FlorianLudwig 在 http://docs.python-requests.org/en/latest/user/advanced/#keep-alive 中提到的 "keep-alive is 100% automatic within a session"

使用 python 请求的一个连接请求某些站点上的多个页面

Request multiple pages on some site useing one connection with python requests

python

keep-alive

python-2.7

python-requests