YouTube 数据 API 使用 pageToken 获取所有评论

YouTube Data API get all comments using pageToken

我正在尝试使用 pageToken 获取所有评论。

这是我的代码

import requests 
import json 

link = 'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&maxResults=100&videoId={videoId}&key={key}'

link_pageToken = 'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&maxResults=100&pageToken={pageToken}&videoId={videoId}&key={key}'

key ='...'

videoId = 'ydDn_TFkzi4'

comments = []

data = requests.get(link.format(videoId = videoId, key = key)).json()

for i in range(len(data['items'])):
  comments.append(data['items'][i]['snippet']['topLevelComment']['snippet']['textOriginal'])

while 'nextPageToken' in data:
  data = requests.get(link_pageToken.format(videoId = videoId, key = key, pageToken = data['nextPageToken']))
  data = data.json()

  for i in range(len(data['items'])):
    comments.append(data['items'][i]['snippet']['topLevelComment']['snippet']['textOriginal'])

这段代码工作正常,但有点多余。所以我尝试将代码修复如下

import requests 
import json 

link_pageToken = 'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&maxResults=100&pageToken={pageToken}&videoId={videoId}&key={key}'

key ='...'

videoId = 'ydDn_TFkzi4'

comments = []

data = requests.get(link_pageToken.format(videoId = videoId, key = key)).json()

while 'nextPageToken' in data:
  data = requests.get(link_pageToken.format(videoId = videoId, key = key, pageToken = data['nextPageToken']))
  data = data.json()

  for i in range(len(data['items'])):
    comments.append(data['items'][i]['snippet']['topLevelComment']['snippet']['textOriginal'])

但是,下面的代码会引发 KeyError: 'pageToken'。 我的猜测是,我首先需要找出是否有pageToken并获取pageToken,然后将其插入URL。

我该怎么做?

谢谢

我尝试了 furas 的第二个答案。这是代码

import requests 
import json 

link_pageToken = 'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&maxResults=100&pageToken={pageToken}&videoId={videoId}&key={key}'

key ='...'

videoId = 'ydDn_TFkzi4'

comments = []

data = requests.get(link_pageToken.format(videoId = videoId, key = key, pageToken="")).json()

while 'nextPageToken' in data:
  data = requests.get(link_pageToken.format(videoId = videoId, key = key, pageToken = data['nextPageToken']))
  data = data.json()

  for i in range(len(data['items'])):
    comments.append(data['items'][i]['snippet']['topLevelComment']['snippet']['textOriginal'])

由于某种原因,与第一个代码相比,它收集的评论较少。 第一个代码收集了 309 条,但这段代码只收集了 209 条评论。这是为什么?

在新版本中你使用相同的 link_pageToken 在两个 get() 中,但它期望 pageToken 你在第一个 format()

中没有

尝试 "{pageToken}".format(),你会得到同样的错误。


首先 get() 你应该使用旧的 link(没有 {pageToken}

r = requests.get(link.format(videoId=videoId, key=key))
data = r.json()

或者至少你应该在 format()

中使用 pageToken=""
r = requests.get(link_pageToken.format(videoId=videoId, key=key, pageToken=""))
data = r.json()

编辑:

如果你想使用一个link那么你可以

link = 'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&maxResults=100&pageToken={pageToken}&videoId={videoId}&key={key}'

r = requests.get(link.format(videoId=videoId, key=key)
data = r.json()

以后

link_pageToken = link + "&pageToken={pageToken}"

r = requests.get(link.format(videoId=videoId, key=key, pageToken=pageToken)
data = r.json()

或使用字典 - 可读性更好

url = 'https://www.googleapis.com/youtube/v3/commentThreads'

payload = {
     "part": "snippet",
     "maxResults": 100,
     "videoId": videoId,
     "key": key,
}

r = requests.get(url, params=payload)
data = r.json()

稍后添加令牌

payload["pageToken"] = pageToken

r = requests.get(url, params=payload)
data = r.json()