从 Zendesk API 获取数据时,为什么 HTTP 状态不佳且 ProtocolError('Connection aborted.', BadStatusLine("''",)) ?
Why bad HTTP status with ProtocolError('Connection aborted.', BadStatusLine("''",)) when getting data from Zendesk API?
我正在尝试从 Zendesk API 获取 user identities
几十万 user id
s,使用 Python 3.4.3 和 requests
图书馆。它适用于许多用户 ID,然后我的程序收到来自 Zendesk 的错误响应 API。
下面是相关的Python函数:
def get_user_identities(user_id):
url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'
session = requests.Session()
session.auth = config.credentials
response = ''
while True:
try:
response = session.get(url)
except requests.ConnectionError as error:
logger.error("ConnectionError: {0}".format(error))
num_seconds = 30
logger.info("Sleeping for {} seconds...".format(num_seconds))
time.sleep(num_seconds)
else:
break
while True:
response = session.get(url)
if response.status_code == 429:
logger.info('Rate limited! Waiting for {} seconds'.format(response.headers['retry-after']))
time.sleep(int(response.headers['retry-after']))
else:
break
if response.status_code != 200:
logger.error('Error with status code {}'.format(response.status_code))
exit()
data = response.json()
此函数在循环中调用,为数千名用户 检索 user identity
没有任何问题,但随后由于 而退出错误的 HTTP 响应状态:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1171, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 640, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.4/dist-packages/urllib3/util/retry.py", line 287, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 72, in <module>
get_user_identities(user_id)
File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 42, in get_user_identities
response = session.get(url)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 467, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 455, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 558, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))
但是当我测试相同的 URL 以使用 HTTPie 获取用户身份时,它工作得很好:
$ http -a user@company.com:password https://companyname.zendesk.com/api/v2/users/1608220001/identities.json
HTTP/1.1 200 OK
Cache-Control: must-revalidate, private, max-age=0
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json; charset=UTF-8
Date: Tue, 12 Sep 2017 15:11:39 GMT
ETag: W/"8135d41f9068e1c2b45d0f307c6431d4"
Last-Modified: Mon, 09 Nov 2015 20:55:44 GMT
Server: nginx
Strict-Transport-Security: max-age=31536000;
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Rack-Cache: miss
X-Rate-Limit: 700
X-Rate-Limit-Remaining: 416
X-Request-Id: f1320883-caf0-4d33-cd94-a0369f4368f8
X-Runtime: 0.381444
X-UA-Compatible: IE=Edge,chrome=1
X-Zendesk-API-Version: v2
X-Zendesk-Application-Version: v40.20
X-Zendesk-Origin-Server: app15.pod3.dub1.zdsys.com
X-Zendesk-Request-Id: a0606a3ae1d043968f53
{
"count": 1,
"identities": [
{
"created_at": "2015-11-09T20:55:44Z",
"deliverable_state": "deliverable",
"id": 1020870341,
"primary": true,
"type": "email",
...
难道 Zendesk REST API 端点是 'thinking' 我正试图 "scrape" 它并故意断开连接?根据 ?
的建议
或者是别的东西,你有什么建议让它起作用吗? (除了伪造用户代理?)
显然,代码必须再捕获一个异常 urllib3.exceptions.MaxRetryError
和一个 HTTP 状态代码 (BAD_GATEWAY_ERROR = 502
),以解决 Zendesk REST API 端点抛出的问题在它:
BAD_GATEWAY_ERROR = 502
RATE_LIMITED_ERROR = 429
MAX_NUM_SECONDS_TO_SLEEP = 30
MAX_NUM_OF_ALLOWED_RETRIES = 10
def get_user_identities(user_id):
url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'
session = requests.Session()
session.auth = config.credentials
script_path = get_script_path()
num_retries = 0
response = ''
while True:
if num_retries > MAX_NUM_OF_ALLOWED_RETRIES:
logger.error('Tried more than {} times without success. Skipping the user id {} .'
.format(MAX_NUM_OF_ALLOWED_RETRIES, user_id))
return
try:
response = session.get(url)
if response.status_code == RATE_LIMITED_ERROR:
logger.info('Rate limited! Waiting for {} seconds and will try again.'
.format(response.headers['retry-after']))
time.sleep(int(response.headers['retry-after']))
num_retries += 1
continue
if response.status_code == BAD_GATEWAY_ERROR:
logger.info('Bad Gateway Error. Waiting for {} seconds and will try again.'
.format(str(MAX_NUM_SECONDS_TO_SLEEP)))
time.sleep(MAX_NUM_SECONDS_TO_SLEEP)
num_retries += 1
continue
if response.status_code != 200:
logger.error('Error with status code {}. Skipping the user id {}'
.format(response.status_code, user_id))
return
except (requests.ConnectionError, urllib3.exceptions.MaxRetryError) as error:
logger.error("ConnectionError: {0}".format(error))
logger.info("Sleeping for {} seconds...".format(MAX_NUM_SECONDS_TO_SLEEP))
time.sleep(MAX_NUM_SECONDS_TO_SLEEP)
num_retries += 1
else:
break
data = response.json()
进行上述更改后,它能够从 Zendesk REST API 端点成功检索超过 700.000 条记录。
我遇到的问题类似于 Zendesk 服务器在这种情况下的行为。
我正在尝试从 Zendesk API 获取 user identities
几十万 user id
s,使用 Python 3.4.3 和 requests
图书馆。它适用于许多用户 ID,然后我的程序收到来自 Zendesk 的错误响应 API。
下面是相关的Python函数:
def get_user_identities(user_id):
url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'
session = requests.Session()
session.auth = config.credentials
response = ''
while True:
try:
response = session.get(url)
except requests.ConnectionError as error:
logger.error("ConnectionError: {0}".format(error))
num_seconds = 30
logger.info("Sleeping for {} seconds...".format(num_seconds))
time.sleep(num_seconds)
else:
break
while True:
response = session.get(url)
if response.status_code == 429:
logger.info('Rate limited! Waiting for {} seconds'.format(response.headers['retry-after']))
time.sleep(int(response.headers['retry-after']))
else:
break
if response.status_code != 200:
logger.error('Error with status code {}'.format(response.status_code))
exit()
data = response.json()
此函数在循环中调用,为数千名用户 检索 user identity
没有任何问题,但随后由于 而退出错误的 HTTP 响应状态:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1171, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 640, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.4/dist-packages/urllib3/util/retry.py", line 287, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 72, in <module>
get_user_identities(user_id)
File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 42, in get_user_identities
response = session.get(url)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 467, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 455, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 558, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))
但是当我测试相同的 URL 以使用 HTTPie 获取用户身份时,它工作得很好:
$ http -a user@company.com:password https://companyname.zendesk.com/api/v2/users/1608220001/identities.json
HTTP/1.1 200 OK
Cache-Control: must-revalidate, private, max-age=0
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json; charset=UTF-8
Date: Tue, 12 Sep 2017 15:11:39 GMT
ETag: W/"8135d41f9068e1c2b45d0f307c6431d4"
Last-Modified: Mon, 09 Nov 2015 20:55:44 GMT
Server: nginx
Strict-Transport-Security: max-age=31536000;
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Rack-Cache: miss
X-Rate-Limit: 700
X-Rate-Limit-Remaining: 416
X-Request-Id: f1320883-caf0-4d33-cd94-a0369f4368f8
X-Runtime: 0.381444
X-UA-Compatible: IE=Edge,chrome=1
X-Zendesk-API-Version: v2
X-Zendesk-Application-Version: v40.20
X-Zendesk-Origin-Server: app15.pod3.dub1.zdsys.com
X-Zendesk-Request-Id: a0606a3ae1d043968f53
{
"count": 1,
"identities": [
{
"created_at": "2015-11-09T20:55:44Z",
"deliverable_state": "deliverable",
"id": 1020870341,
"primary": true,
"type": "email",
...
难道 Zendesk REST API 端点是 'thinking' 我正试图 "scrape" 它并故意断开连接?根据 ?
的建议或者是别的东西,你有什么建议让它起作用吗? (除了伪造用户代理?)
显然,代码必须再捕获一个异常 urllib3.exceptions.MaxRetryError
和一个 HTTP 状态代码 (BAD_GATEWAY_ERROR = 502
),以解决 Zendesk REST API 端点抛出的问题在它:
BAD_GATEWAY_ERROR = 502
RATE_LIMITED_ERROR = 429
MAX_NUM_SECONDS_TO_SLEEP = 30
MAX_NUM_OF_ALLOWED_RETRIES = 10
def get_user_identities(user_id):
url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'
session = requests.Session()
session.auth = config.credentials
script_path = get_script_path()
num_retries = 0
response = ''
while True:
if num_retries > MAX_NUM_OF_ALLOWED_RETRIES:
logger.error('Tried more than {} times without success. Skipping the user id {} .'
.format(MAX_NUM_OF_ALLOWED_RETRIES, user_id))
return
try:
response = session.get(url)
if response.status_code == RATE_LIMITED_ERROR:
logger.info('Rate limited! Waiting for {} seconds and will try again.'
.format(response.headers['retry-after']))
time.sleep(int(response.headers['retry-after']))
num_retries += 1
continue
if response.status_code == BAD_GATEWAY_ERROR:
logger.info('Bad Gateway Error. Waiting for {} seconds and will try again.'
.format(str(MAX_NUM_SECONDS_TO_SLEEP)))
time.sleep(MAX_NUM_SECONDS_TO_SLEEP)
num_retries += 1
continue
if response.status_code != 200:
logger.error('Error with status code {}. Skipping the user id {}'
.format(response.status_code, user_id))
return
except (requests.ConnectionError, urllib3.exceptions.MaxRetryError) as error:
logger.error("ConnectionError: {0}".format(error))
logger.info("Sleeping for {} seconds...".format(MAX_NUM_SECONDS_TO_SLEEP))
time.sleep(MAX_NUM_SECONDS_TO_SLEEP)
num_retries += 1
else:
break
data = response.json()
进行上述更改后,它能够从 Zendesk REST API 端点成功检索超过 700.000 条记录。
我遇到的问题类似于 Zendesk 服务器在这种情况下的行为。