IMDBpy - 从前 20 部电影中获取流派
IMDBpy - Get Genres from the Top 20 movies
我正在尝试提取包含前 20 部电影以及每种类型和演员的数据集。为此,我正在尝试使用以下代码:
top250 = ia.get_top250_movies()
limit = 20;
index = 0;
output = []
for item in top250:
for genre in top250['genres']:
index += 1;
if index <= limit:
print(item['long imdb canonical title'], ": ", genre);
else:
break;
我收到以下错误:
Traceback (most recent call last):
File "C:/Users/avilares/PycharmProjects/IMDB/IMDB.py", line 21, in <module>
for genre in top250['genres']:
TypeError: list indices must be integers or slices, not str
我认为对象 top250 没有内容类型...
有人知道如何识别每部电影的每个类型吗?
非常感谢!
来自IMDbPY docs:
"It’s possible to retrieve the list of top 250 and bottom 100 movies:"
>>> top = ia.get_top250_movies()
>>> top[0]
<Movie id:0111161[http] title:_The Shawshank Redemption (1994)_>
>>> bottom = ia.get_bottom100_movies()
>>> bottom[0]
<Movie id:4458206[http] title:_Code Name: K.O.Z. (2015)_>
get_top_250_movies()
returns 一个列表,因此您无法直接访问电影的类型。
解决方法如下:
# Iterate through the movies in the top 250
for topmovie in top250:
# First, retrieve the movie object using its ID
movie = ia.get_movie(topmovie.movieID)
# Print the movie's genres
for genre in movie['genres']:
print(genre)
完整的工作代码:
import imdb
ia = imdb.IMDb()
top250 = ia.get_top250_movies()
# Iterate through the first 20 movies in the top 250
for movie_count in range(0, 20):
# First, retrieve the movie object using its ID
movie = ia.get_movie(top250[movie_count].movieID)
# Print movie title and genres
print(movie['title'])
print(*movie['genres'], sep=", ")
输出:
The Shawshank Redemption
Drama
The Godfather
Crime, Drama
The Godfather: Part II
Crime, Drama
The Dark Knight
Action, Crime, Drama, Thriller
12 Angry Men
Crime, Drama
Schindler's List
Biography, Drama, History
The Lord of the Rings: The Return of the King
Action, Adventure, Drama, Fantasy
Pulp Fiction
Crime, Drama
The Good, the Bad and the Ugly
Western
Fight Club
Drama
The Lord of the Rings: The Fellowship of the Ring
Adventure, Drama, Fantasy
Forrest Gump
Drama, Romance
Star Wars: Episode V - The Empire Strikes Back
Action, Adventure, Fantasy, Sci-Fi
Inception
Action, Adventure, Sci-Fi, Thriller
The Lord of the Rings: The Two Towers
Adventure, Drama, Fantasy
One Flew Over the Cuckoo's Nest
Drama
Goodfellas
Crime, Drama
The Matrix
Action, Sci-Fi
Seven Samurai
Adventure, Drama
City of God
Crime, Drama
这里是更短的Pythonic代码,笔记本可以访问here.
Python 提供了一些更清晰的方式来理解我们的代码。在这个脚本中,我使用了两种这样的技术。
技巧一:列表理解
列表理解只不过是遍历可迭代对象并生成一个列表作为输出。在这里我们也可以包括计算和条件。另一种技术即技术-2:词典理解与此非常相似,您可以阅读它here。
例如没有列表理解的代码
numbers = []
for i in range(10):
numbers.append(i)
print(numbers)
#Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
使用列表理解的代码
numbers = [i for i in range(10)]
print(numbers)
#Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
谈到 OP 问题,get_top250_movies() 函数 returns 包含很少细节的电影列表。它 returns 的确切参数可以这样检查。如输出中所示,电影详细信息不包含流派和其他详细信息。
from imdb import IMDb
ia = IMDb()
top250Movies = ia.get_top250_movies()
top250Movies[0].items()
#output:
[('rating', 9.2),
('title', 'The Shawshank Redemption'),
('year', 1994),
('votes', 2222548),
('top 250 rank', 1),
('kind', 'movie'),
('canonical title', 'Shawshank Redemption, The'),
('long imdb title', 'The Shawshank Redemption (1994)'),
('long imdb canonical title', 'Shawshank Redemption, The (1994)'),
('smart canonical title', 'Shawshank Redemption, The'),
('smart long imdb canonical title', 'Shawshank Redemption, The (1994)')]
但是,get_movie() 函数 returns 提供了很多关于电影的更多信息,包括 Genres。
我们结合这两个函数来获得前 20 部电影的类型。首先,我们将 get_top250_movies() 称为 returns 前 250 部电影的列表,但细节较少(我们只对获取电影 ID 感兴趣)。然后我们为顶级电影列表中的每个 movieID 调用 get_movie() 并且这个 returns 我们的流派。
程序:
from imdb import IMDb
#initialize and get top 250 movies; this list of movies returned only has
#fewer details and doesn't have genres
ia = IMDb()
top250Movies = ia.get_top250_movies()
#TECHNIQUE-1: List comprehension
#get top 20 Movies this way which returns lot of details including genres
top20Movies = [ia.get_movie(movie.movieID) for movie in top250Movies[:20]]
#TECHNIQUE-2: Dictionary comprehension
#expected output as a dictionary of movie titles: movie genres
{movie['title']:movie['genres'] for movie in top20Movies}
输出:
{'12 Angry Men': ['Drama'],
'Fight Club': ['Drama'],
'Forrest Gump': ['Drama', 'Romance'],
'Goodfellas': ['Biography', 'Crime', 'Drama'],
'Inception': ['Action', 'Adventure', 'Sci-Fi', 'Thriller'],
"One Flew Over the Cuckoo's Nest": ['Drama'],
'Pulp Fiction': ['Crime', 'Drama'],
"Schindler's List": ['Biography', 'Drama', 'History'],
'Se7en': ['Crime', 'Drama', 'Mystery', 'Thriller'],
'Seven Samurai': ['Action', 'Adventure', 'Drama'],
'Star Wars: Episode V - The Empire Strikes Back': ['Action',
'Adventure',
'Fantasy',
'Sci-Fi'],
'The Dark Knight': ['Action', 'Crime', 'Drama', 'Thriller'],
'The Godfather': ['Crime', 'Drama'],
'The Godfather: Part II': ['Crime', 'Drama'],
'The Good, the Bad and the Ugly': ['Western'],
'The Lord of the Rings: The Fellowship of the Ring': ['Action',
'Adventure',
'Drama',
'Fantasy'],
'The Lord of the Rings: The Return of the King': ['Adventure',
'Drama',
'Fantasy'],
'The Lord of the Rings: The Two Towers': ['Adventure', 'Drama', 'Fantasy'],
'The Matrix': ['Action', 'Sci-Fi'],
'The Shawshank Redemption': ['Drama']}
我正在尝试提取包含前 20 部电影以及每种类型和演员的数据集。为此,我正在尝试使用以下代码:
top250 = ia.get_top250_movies()
limit = 20;
index = 0;
output = []
for item in top250:
for genre in top250['genres']:
index += 1;
if index <= limit:
print(item['long imdb canonical title'], ": ", genre);
else:
break;
我收到以下错误:
Traceback (most recent call last):
File "C:/Users/avilares/PycharmProjects/IMDB/IMDB.py", line 21, in <module>
for genre in top250['genres']:
TypeError: list indices must be integers or slices, not str
我认为对象 top250 没有内容类型...
有人知道如何识别每部电影的每个类型吗?
非常感谢!
来自IMDbPY docs:
"It’s possible to retrieve the list of top 250 and bottom 100 movies:"
>>> top = ia.get_top250_movies()
>>> top[0]
<Movie id:0111161[http] title:_The Shawshank Redemption (1994)_>
>>> bottom = ia.get_bottom100_movies()
>>> bottom[0]
<Movie id:4458206[http] title:_Code Name: K.O.Z. (2015)_>
get_top_250_movies()
returns 一个列表,因此您无法直接访问电影的类型。
解决方法如下:
# Iterate through the movies in the top 250
for topmovie in top250:
# First, retrieve the movie object using its ID
movie = ia.get_movie(topmovie.movieID)
# Print the movie's genres
for genre in movie['genres']:
print(genre)
完整的工作代码:
import imdb
ia = imdb.IMDb()
top250 = ia.get_top250_movies()
# Iterate through the first 20 movies in the top 250
for movie_count in range(0, 20):
# First, retrieve the movie object using its ID
movie = ia.get_movie(top250[movie_count].movieID)
# Print movie title and genres
print(movie['title'])
print(*movie['genres'], sep=", ")
输出:
The Shawshank Redemption
Drama
The Godfather
Crime, Drama
The Godfather: Part II
Crime, Drama
The Dark Knight
Action, Crime, Drama, Thriller
12 Angry Men
Crime, Drama
Schindler's List
Biography, Drama, History
The Lord of the Rings: The Return of the King
Action, Adventure, Drama, Fantasy
Pulp Fiction
Crime, Drama
The Good, the Bad and the Ugly
Western
Fight Club
Drama
The Lord of the Rings: The Fellowship of the Ring
Adventure, Drama, Fantasy
Forrest Gump
Drama, Romance
Star Wars: Episode V - The Empire Strikes Back
Action, Adventure, Fantasy, Sci-Fi
Inception
Action, Adventure, Sci-Fi, Thriller
The Lord of the Rings: The Two Towers
Adventure, Drama, Fantasy
One Flew Over the Cuckoo's Nest
Drama
Goodfellas
Crime, Drama
The Matrix
Action, Sci-Fi
Seven Samurai
Adventure, Drama
City of God
Crime, Drama
这里是更短的Pythonic代码,笔记本可以访问here.
Python 提供了一些更清晰的方式来理解我们的代码。在这个脚本中,我使用了两种这样的技术。
技巧一:列表理解
列表理解只不过是遍历可迭代对象并生成一个列表作为输出。在这里我们也可以包括计算和条件。另一种技术即技术-2:词典理解与此非常相似,您可以阅读它here。
例如没有列表理解的代码
numbers = []
for i in range(10):
numbers.append(i)
print(numbers)
#Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
使用列表理解的代码
numbers = [i for i in range(10)]
print(numbers)
#Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
谈到 OP 问题,get_top250_movies() 函数 returns 包含很少细节的电影列表。它 returns 的确切参数可以这样检查。如输出中所示,电影详细信息不包含流派和其他详细信息。
from imdb import IMDb
ia = IMDb()
top250Movies = ia.get_top250_movies()
top250Movies[0].items()
#output:
[('rating', 9.2),
('title', 'The Shawshank Redemption'),
('year', 1994),
('votes', 2222548),
('top 250 rank', 1),
('kind', 'movie'),
('canonical title', 'Shawshank Redemption, The'),
('long imdb title', 'The Shawshank Redemption (1994)'),
('long imdb canonical title', 'Shawshank Redemption, The (1994)'),
('smart canonical title', 'Shawshank Redemption, The'),
('smart long imdb canonical title', 'Shawshank Redemption, The (1994)')]
但是,get_movie() 函数 returns 提供了很多关于电影的更多信息,包括 Genres。
我们结合这两个函数来获得前 20 部电影的类型。首先,我们将 get_top250_movies() 称为 returns 前 250 部电影的列表,但细节较少(我们只对获取电影 ID 感兴趣)。然后我们为顶级电影列表中的每个 movieID 调用 get_movie() 并且这个 returns 我们的流派。
程序:
from imdb import IMDb
#initialize and get top 250 movies; this list of movies returned only has
#fewer details and doesn't have genres
ia = IMDb()
top250Movies = ia.get_top250_movies()
#TECHNIQUE-1: List comprehension
#get top 20 Movies this way which returns lot of details including genres
top20Movies = [ia.get_movie(movie.movieID) for movie in top250Movies[:20]]
#TECHNIQUE-2: Dictionary comprehension
#expected output as a dictionary of movie titles: movie genres
{movie['title']:movie['genres'] for movie in top20Movies}
输出:
{'12 Angry Men': ['Drama'],
'Fight Club': ['Drama'],
'Forrest Gump': ['Drama', 'Romance'],
'Goodfellas': ['Biography', 'Crime', 'Drama'],
'Inception': ['Action', 'Adventure', 'Sci-Fi', 'Thriller'],
"One Flew Over the Cuckoo's Nest": ['Drama'],
'Pulp Fiction': ['Crime', 'Drama'],
"Schindler's List": ['Biography', 'Drama', 'History'],
'Se7en': ['Crime', 'Drama', 'Mystery', 'Thriller'],
'Seven Samurai': ['Action', 'Adventure', 'Drama'],
'Star Wars: Episode V - The Empire Strikes Back': ['Action',
'Adventure',
'Fantasy',
'Sci-Fi'],
'The Dark Knight': ['Action', 'Crime', 'Drama', 'Thriller'],
'The Godfather': ['Crime', 'Drama'],
'The Godfather: Part II': ['Crime', 'Drama'],
'The Good, the Bad and the Ugly': ['Western'],
'The Lord of the Rings: The Fellowship of the Ring': ['Action',
'Adventure',
'Drama',
'Fantasy'],
'The Lord of the Rings: The Return of the King': ['Adventure',
'Drama',
'Fantasy'],
'The Lord of the Rings: The Two Towers': ['Adventure', 'Drama', 'Fantasy'],
'The Matrix': ['Action', 'Sci-Fi'],
'The Shawshank Redemption': ['Drama']}