如何使用 YouTube API 提取所有 YouTube 评论? (Python)
How to extract all YouTube comments using YouTube API? (Python)
假设我有一个 video_id
有 8487
个评论。
此代码 returns 仅 4309
条评论。
def get_comments(youtube, video_id, comments=[], token=''):
video_response=youtube.commentThreads().list(part='snippet',
videoId=video_id,
pageToken=token).execute()
for item in video_response['items']:
comment = item['snippet']['topLevelComment']
text = comment['snippet']['textDisplay']
comments.append(text)
if "nextPageToken" in video_response:
return get_comments(youtube, video_id, comments, video_response['nextPageToken'])
else:
return comments
youtube = build('youtube', 'v3',developerKey=api_key)
comment_threads = get_comments(youtube,video_id)
print(len(comment_threads))
> 4309
如何提取所有 8487
条评论?
根据 commentThreads
的回答,您必须添加 replies
参数才能检索评论可能有的回复。
因此,您的请求应如下所示:
video_response=youtube.commentThreads().list(part='id,snippet,replies',
videoId=video_id,
pageToken=token).execute()
然后,相应地修改您的代码以阅读 replies
的评论。
在此 example I made 中,使用文档中提供的 try-it 功能,您可以检查回复是否包含顶部评论及其回复。
编辑 (08/04/2022):
创建一个新变量,其中包含 topLevelComment
可能 具有的 totalReplyCount
。
类似于:
def get_comments(youtube, video_id, comments=[], token=''):
# Stores the total reply count a top level commnet has.
totalReplyCount = 0
# Replies of the top-level comment might have.
replies=[]
video_response=youtube.commentThreads().list(part='snippet',
videoId=video_id,
pageToken=token).execute()
for item in video_response['items']:
comment = item['snippet']['topLevelComment']
text = comment['snippet']['textDisplay']
# Get the total reply count:
totalReplyCount = item['snippet']['totalReplyCount']
# Check if the total reply count is greater than zero,
# if so,call the new function "getAllTopLevelCommentReplies(topCommentId, replies, token)"
# and extend the "comments" returned list.
if (totalReplyCount > 0):
comments.extend(getAllTopLevelCommentReplies(comment['id'], replies, None))
else:
comments.append(text)
# Clear variable - just in case - not sure if need due "get_comments" function initializes the variable.
replies = []
if "nextPageToken" in video_response:
return get_comments(youtube, video_id, comments, video_response['nextPageToken'])
else:
return comments
然后,如果 totalReplyCount
的值大于零,则使用 comment.list 进行另一个调用以获取顶级评论的回复。
对于这个新调用,您必须传递顶级评论的 id。
示例(未测试):
# Returns all replies the top-level comment has:
# topCommentId = it's the id of the top-level comment you want to retrieve its replies.
# replies = array of replies returned by this function.
# token = the comments.list might return moren than 100 comments, if so, use the nextPageToken for retrieve the next batch of results.
def getAllTopLevelCommentReplies(topCommentId, replies, token):
replies_response=youtube.comments().list(part='snippet',
maxResults=100,
parentId=topCommentId
pageToken=token).execute()
for item in replies_response['items']:
# Append the reply's text to the
replies.append(item['snippet']['textDisplay'])
if "nextPageToken" in replies_response:
return getAllTopLevelCommentReplies(topCommentId, replies, replies_response['nextPageToken'])
else:
return replies
编辑 (11/04/2022):
我添加了我根据您的代码修改的 Google Colab example,它适用于我的视频示例 (ouf0ozwnU84) = 它带来了 130 条评论,但是,用你的视频示例 (BaGgScV4NN8) 我得到了 3359 中的 3300。
这可能是一些评论可能在 approval/moderation 下或我遗漏的其他内容,或者 可能 评论太旧,需要额外的过滤器,或者API 有问题 - see here some other questions related to troubles facing with the pagination using the API - I suggest you to check this tutorial 显示代码,您可以更改它。
假设我有一个 video_id
有 8487
个评论。
此代码 returns 仅 4309
条评论。
def get_comments(youtube, video_id, comments=[], token=''):
video_response=youtube.commentThreads().list(part='snippet',
videoId=video_id,
pageToken=token).execute()
for item in video_response['items']:
comment = item['snippet']['topLevelComment']
text = comment['snippet']['textDisplay']
comments.append(text)
if "nextPageToken" in video_response:
return get_comments(youtube, video_id, comments, video_response['nextPageToken'])
else:
return comments
youtube = build('youtube', 'v3',developerKey=api_key)
comment_threads = get_comments(youtube,video_id)
print(len(comment_threads))
> 4309
如何提取所有 8487
条评论?
根据 commentThreads
的回答,您必须添加 replies
参数才能检索评论可能有的回复。
因此,您的请求应如下所示:
video_response=youtube.commentThreads().list(part='id,snippet,replies',
videoId=video_id,
pageToken=token).execute()
然后,相应地修改您的代码以阅读 replies
的评论。
在此 example I made 中,使用文档中提供的 try-it 功能,您可以检查回复是否包含顶部评论及其回复。
编辑 (08/04/2022):
创建一个新变量,其中包含 topLevelComment
可能 具有的 totalReplyCount
。
类似于:
def get_comments(youtube, video_id, comments=[], token=''):
# Stores the total reply count a top level commnet has.
totalReplyCount = 0
# Replies of the top-level comment might have.
replies=[]
video_response=youtube.commentThreads().list(part='snippet',
videoId=video_id,
pageToken=token).execute()
for item in video_response['items']:
comment = item['snippet']['topLevelComment']
text = comment['snippet']['textDisplay']
# Get the total reply count:
totalReplyCount = item['snippet']['totalReplyCount']
# Check if the total reply count is greater than zero,
# if so,call the new function "getAllTopLevelCommentReplies(topCommentId, replies, token)"
# and extend the "comments" returned list.
if (totalReplyCount > 0):
comments.extend(getAllTopLevelCommentReplies(comment['id'], replies, None))
else:
comments.append(text)
# Clear variable - just in case - not sure if need due "get_comments" function initializes the variable.
replies = []
if "nextPageToken" in video_response:
return get_comments(youtube, video_id, comments, video_response['nextPageToken'])
else:
return comments
然后,如果 totalReplyCount
的值大于零,则使用 comment.list 进行另一个调用以获取顶级评论的回复。
对于这个新调用,您必须传递顶级评论的 id。
示例(未测试):
# Returns all replies the top-level comment has:
# topCommentId = it's the id of the top-level comment you want to retrieve its replies.
# replies = array of replies returned by this function.
# token = the comments.list might return moren than 100 comments, if so, use the nextPageToken for retrieve the next batch of results.
def getAllTopLevelCommentReplies(topCommentId, replies, token):
replies_response=youtube.comments().list(part='snippet',
maxResults=100,
parentId=topCommentId
pageToken=token).execute()
for item in replies_response['items']:
# Append the reply's text to the
replies.append(item['snippet']['textDisplay'])
if "nextPageToken" in replies_response:
return getAllTopLevelCommentReplies(topCommentId, replies, replies_response['nextPageToken'])
else:
return replies
编辑 (11/04/2022):
我添加了我根据您的代码修改的 Google Colab example,它适用于我的视频示例 (ouf0ozwnU84) = 它带来了 130 条评论,但是,用你的视频示例 (BaGgScV4NN8) 我得到了 3359 中的 3300。
这可能是一些评论可能在 approval/moderation 下或我遗漏的其他内容,或者 可能 评论太旧,需要额外的过滤器,或者API 有问题 - see here some other questions related to troubles facing with the pagination using the API - I suggest you to check this tutorial 显示代码,您可以更改它。