处理请求速率限制,MusicBrainz API
Dealing with request rate limits, MusicBrainz API
问题:时间延迟是处理请求速率限制的好方法吗?
我对请求、APIs 和 Web 服务还很陌生。我正在尝试创建一个网络服务,给定一个 ID,向 MusicBrainz API 发出请求并检索一些信息。但是,显然我提出的要求太多,或者要求太快。在代码的最后一行,如果delay
参数设置为0
,就会出现这个错误:
{'error': 'Your requests are exceeding the allowable rate limit. Please see http://wiki.musicbrainz.org/XMLWebService for more information.'}
调查 link,我发现:
The rate at which your IP address is making requests is measured. If that rate is too high, all your requests will be declined (http 503) until the rate drops again. Currently that rate is (on average) 1 request per second.
所以我想,好吧,我插入一个1秒的延时,就可以了。它奏效了,但我想有更好、更简洁、更聪明的方法来处理这样的问题。你认识一个吗?
代码:
####################################################
################### INSTRUCTIONS ###################
####################################################
'''
This script runs locally and returns a JSON formatted file, containing
information about the release-groups of an artist whose MBID must be provided.
'''
#########################################
############ CODE STARTS ################
#########################################
#IMPORT PACKAGES
#All of them come with Anaconda3 installation, otherwise they can be installed with pip
import requests
import json
import math
import time
#Base URL for looking-up release-groups on musicbrainz.org
root_URL = 'http://musicbrainz.org/ws/2/'
#Parameters to run an example
offset = 10
limit = 1
MBID = '65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab'
def collect_data(MBID, root_URL):
'''
Description: Auxiliar function to collect data from the MusicBrainz API
Arguments:
MBID - MusicBrainz Identity of some artist.
root_URL - MusicBrainz root_URL for requests
Returns:
decoded_output - dictionary file containing all the information about the release-groups
of type album of the requested artist
'''
#Joins paths. Note: Release-groups can be filtered by type.
URL_complete = root_URL + 'release-group?artist=' + MBID + '&type=album' + '&fmt=json'
#Creates a requests object and sends a GET request
request = requests.get(URL_complete)
assert request.status_code == 200
output = request.content #bits file
decoded_output = json.loads(output) #dict file
return decoded_output
def collect_releases(release_group_id, root_URL, delay = 1):
'''
Description: Auxiliar function to collect data from the MusicBrainz API
Arguments:
release_group_id - ID of the release-group whose number of releases is to be extracted
root_URL - MusicBrainz root_URL for requests
Returns:
releases_count - integer containing the number of releases of the release-group
'''
URL_complete = root_URL + 'release-group/' + release_group_id + '?inc=releases' + '&fmt=json'
#Creates a requests object and sends a GET request
request = requests.get(URL_complete)
#Parses the content of the request to a dictionary
output = request.content
decoded_output = json.loads(output)
#Time delay to not exceed MusicBrainz request rate limit
time.sleep(delay)
releases_count = 0
if 'releases' in decoded_output:
releases_count = len(decoded_output['releases'])
else:
print(decoded_output)
#raise ValueError(decoded_output)
return releases_count
def paginate(store_albums, offset, limit = 50):
'''
Description: Auxiliar function to paginate results
Arguments:
store_albums - Dictionary containing information about each release-group
offset - Integer. Corresponds to starting album to show.
limit - Integer. Default to 50. Maximum number of albums to show per page
Returns:
albums_paginated - Paginated albums according to specified limit and offset
'''
#Restricts limit to 150
if limit > 150:
limit = 150
if offset > len(store_albums['albums']):
raise ValueError('Offset is greater than number of albums')
#Apply offset
albums_offset = store_albums['albums'][offset:]
#Count pages
pages = math.ceil(len(albums_offset) / limit)
albums_limited = []
if len(albums_offset) > limit:
for i in range(pages):
albums_limited.append(albums_offset[i * limit : (i+1) * limit])
else:
albums_limited = albums_offset
albums_paginated = {'albums' : None}
albums_paginated['albums'] = albums_limited
return albums_paginated
def post(MBID, offset, limit, delay = 1):
#Calls the auxiliar function 'collect_data' that retrieves the JSON file from MusicBrainz API
json_file = collect_data(MBID, root_URL)
#Creates list and dictionary for storing the information about each release-group
album_details_list = []
album_details = {"id": None, "title": None, "year": None, "release_count": None}
#Loops through all release-groups in the JSON file
for item in json_file['release-groups']:
album_details["id"] = item["id"]
album_details["title"] = item["title"]
album_details["year"] = item["first-release-date"].split("-")[0]
album_details["release_count"] = collect_releases(item["id"], root_URL, delay)
album_details_list.append(album_details.copy())
#Creates dictionary with all the albums of the artist
store_albums = {"albums": None}
store_albums["albums"] = album_details_list
#Paginates the dictionary
stored_paginated_albums = paginate(store_albums, offset , limit)
#Returns JSON typed file containing the different albums arranged according to offset&limit
return json.dumps(stored_paginated_albums)
#Runs the program and prints the JSON output as specified in the wording of the exercise
print(post(MBID, offset, limit, delay = 1))
除了请求 API 所有者提高您的速率限制之外,没有更好的方法来处理此问题。避免速率限制问题的唯一方法是一次不要发出太多请求,除了以绕过其请求计数器的方式破解 API 之外,您还需要在每个请求之间等待一秒钟请求。
问题:时间延迟是处理请求速率限制的好方法吗?
我对请求、APIs 和 Web 服务还很陌生。我正在尝试创建一个网络服务,给定一个 ID,向 MusicBrainz API 发出请求并检索一些信息。但是,显然我提出的要求太多,或者要求太快。在代码的最后一行,如果delay
参数设置为0
,就会出现这个错误:
{'error': 'Your requests are exceeding the allowable rate limit. Please see http://wiki.musicbrainz.org/XMLWebService for more information.'}
调查 link,我发现:
The rate at which your IP address is making requests is measured. If that rate is too high, all your requests will be declined (http 503) until the rate drops again. Currently that rate is (on average) 1 request per second.
所以我想,好吧,我插入一个1秒的延时,就可以了。它奏效了,但我想有更好、更简洁、更聪明的方法来处理这样的问题。你认识一个吗?
代码:
####################################################
################### INSTRUCTIONS ###################
####################################################
'''
This script runs locally and returns a JSON formatted file, containing
information about the release-groups of an artist whose MBID must be provided.
'''
#########################################
############ CODE STARTS ################
#########################################
#IMPORT PACKAGES
#All of them come with Anaconda3 installation, otherwise they can be installed with pip
import requests
import json
import math
import time
#Base URL for looking-up release-groups on musicbrainz.org
root_URL = 'http://musicbrainz.org/ws/2/'
#Parameters to run an example
offset = 10
limit = 1
MBID = '65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab'
def collect_data(MBID, root_URL):
'''
Description: Auxiliar function to collect data from the MusicBrainz API
Arguments:
MBID - MusicBrainz Identity of some artist.
root_URL - MusicBrainz root_URL for requests
Returns:
decoded_output - dictionary file containing all the information about the release-groups
of type album of the requested artist
'''
#Joins paths. Note: Release-groups can be filtered by type.
URL_complete = root_URL + 'release-group?artist=' + MBID + '&type=album' + '&fmt=json'
#Creates a requests object and sends a GET request
request = requests.get(URL_complete)
assert request.status_code == 200
output = request.content #bits file
decoded_output = json.loads(output) #dict file
return decoded_output
def collect_releases(release_group_id, root_URL, delay = 1):
'''
Description: Auxiliar function to collect data from the MusicBrainz API
Arguments:
release_group_id - ID of the release-group whose number of releases is to be extracted
root_URL - MusicBrainz root_URL for requests
Returns:
releases_count - integer containing the number of releases of the release-group
'''
URL_complete = root_URL + 'release-group/' + release_group_id + '?inc=releases' + '&fmt=json'
#Creates a requests object and sends a GET request
request = requests.get(URL_complete)
#Parses the content of the request to a dictionary
output = request.content
decoded_output = json.loads(output)
#Time delay to not exceed MusicBrainz request rate limit
time.sleep(delay)
releases_count = 0
if 'releases' in decoded_output:
releases_count = len(decoded_output['releases'])
else:
print(decoded_output)
#raise ValueError(decoded_output)
return releases_count
def paginate(store_albums, offset, limit = 50):
'''
Description: Auxiliar function to paginate results
Arguments:
store_albums - Dictionary containing information about each release-group
offset - Integer. Corresponds to starting album to show.
limit - Integer. Default to 50. Maximum number of albums to show per page
Returns:
albums_paginated - Paginated albums according to specified limit and offset
'''
#Restricts limit to 150
if limit > 150:
limit = 150
if offset > len(store_albums['albums']):
raise ValueError('Offset is greater than number of albums')
#Apply offset
albums_offset = store_albums['albums'][offset:]
#Count pages
pages = math.ceil(len(albums_offset) / limit)
albums_limited = []
if len(albums_offset) > limit:
for i in range(pages):
albums_limited.append(albums_offset[i * limit : (i+1) * limit])
else:
albums_limited = albums_offset
albums_paginated = {'albums' : None}
albums_paginated['albums'] = albums_limited
return albums_paginated
def post(MBID, offset, limit, delay = 1):
#Calls the auxiliar function 'collect_data' that retrieves the JSON file from MusicBrainz API
json_file = collect_data(MBID, root_URL)
#Creates list and dictionary for storing the information about each release-group
album_details_list = []
album_details = {"id": None, "title": None, "year": None, "release_count": None}
#Loops through all release-groups in the JSON file
for item in json_file['release-groups']:
album_details["id"] = item["id"]
album_details["title"] = item["title"]
album_details["year"] = item["first-release-date"].split("-")[0]
album_details["release_count"] = collect_releases(item["id"], root_URL, delay)
album_details_list.append(album_details.copy())
#Creates dictionary with all the albums of the artist
store_albums = {"albums": None}
store_albums["albums"] = album_details_list
#Paginates the dictionary
stored_paginated_albums = paginate(store_albums, offset , limit)
#Returns JSON typed file containing the different albums arranged according to offset&limit
return json.dumps(stored_paginated_albums)
#Runs the program and prints the JSON output as specified in the wording of the exercise
print(post(MBID, offset, limit, delay = 1))
除了请求 API 所有者提高您的速率限制之外,没有更好的方法来处理此问题。避免速率限制问题的唯一方法是一次不要发出太多请求,除了以绕过其请求计数器的方式破解 API 之外,您还需要在每个请求之间等待一秒钟请求。