如何使用 python 循环分页 API

Question

我需要从 REST API 中检索 500 部最受欢迎的电影，但结果限制为每页 20 个，而且我每 10 秒只能调用 40 个（https://developers.themoviedb.org/3/getting-started/request-rate-limiting ).我无法动态循环分页结果，因此 500 个最受欢迎的结果都在一个列表中。

我可以成功 return 前 20 部最受欢迎的电影（见下文）并枚举电影的编号，但我在循环中遇到困难，该循环允许我在前 500 部中分页，而无需由于 API 速率限制而超时。

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

#Returning all drama films >= 2004 in popularity desc
discover_api = requests.get(discover_api).json()

most_popular_films = discover_api['results']

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])


Sample response:

{
  "page": 1,
  "total_results": 101685,
  "total_pages": 5085,
  "results": [
    {
      "vote_count": 13,
      "id": 280960,
      "video": false,
      "vote_average": 5.2,
      "title": "Catarina and the others",
      "popularity": 130.491,
      "poster_path": "/kZMCbp0o46Tsg43omSHNHJKNTx9.jpg",
      "original_language": "pt",
      "original_title": "Catarina e os Outros",
      "genre_ids": [
        18,
        9648
      ],
      "backdrop_path": "/9nDiMhvL3FtaWMsvvvzQIuq276X.jpg",
      "adult": false,
      "overview": "Outside, the first sun rays break the dawn.  Sixteen years old Catarina can't fall asleep.  Inconsequently, in the big city adults are moved by desire...  Catarina found she is HIV positive. She wants to drag everyone else along.",
      "release_date": "2011-03-01"
    },
    {
      "vote_count": 9,
      "id": 531309,
      "video": false,
      "vote_average": 4.6,
      "title": "Brightburn",
      "popularity": 127.582,
      "poster_path": "/roslEbKdY0WSgYaB5KXvPKY0bXS.jpg",
      "original_language": "en",
      "original_title": "Brightburn",
      "genre_ids": [
        27,
        878,
        18,
        53
      ],

我需要 python 循环将分页结果附加到一个列表中，直到我捕获了 500 部最受欢迎的电影。


Desired Output:

Movie_ID  Movie_Title
280960    Catarina and the others
531309    Brightburn
438650    Cold Pursuit
537915    After
50465     Glass
457799    Extremely Wicked, Shockingly Evil and Vile

Answer 1

大多数 API 都包含一个 next_url 字段来帮助您遍历所有结果。让我们检查一些案例。

1。没有 `next_url` 字段

您可以循环遍历所有页面，直到 results 字段为空：

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api_url = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

most_popular_films = []
new_results = True
page = 1
while new_results:
    discover_api = requests.get(discover_api_url + f"&page={page}").json()
    new_results = discover_api.get("results", [])
    most_popular_films.extend(new_results)
    page += 1

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

2。取决于 `total_pages` 领域

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api_url = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

discover_api = requests.get(discover_api_url).json()
most_popular_films = discover_api["results"]
for page in range(2, discover_api["total_pages"]+1):
    discover_api = requests.get(discover_api_url + f"&page={page}").json()
    most_popular_films.extend(discover_api["results"])

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

3。 `next_url` 字段存在！耶！

相同的想法，只是现在我们检查 next_url 字段是否为空 - 如果为空，则为最后一页。

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

discover_api = requests.get(discover_api).json()
most_popular_films = discover_api["results"]
while discover_api["next_url"]:
    discover_api = requests.get(discover_api["next_url"]).json()
    most_popular_films.extend(discover_api["results"])

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

如何使用 python 循环分页 API

How to loop through paginated API using python

python

restful-url

1。没有 `next_url` 字段

2。取决于 `total_pages` 领域

3。 `next_url` 字段存在！耶！

如何使用 python 循环分页 API

How to loop through paginated API using python

python

restful-url

1。没有 next_url 字段

2。取决于 total_pages 领域

3。 next_url 字段存在！耶！

1。没有 `next_url` 字段

2。取决于 `total_pages` 领域

3。 `next_url` 字段存在！耶！