Youtube 数据 Api 页面令牌问题 (python)
Youtube Data Api Page Token Question (python)
我尝试下载 2019 年的视频元数据。每次我 运行 我的代码都超过配额限制。那段时间我有不到 100 个视频。谁能告诉我更好的代码编写方法?
try:
request = youtube.search().list(
part = 'id, snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
pageToken = None,
maxResults = 50
)
response = request.execute()
nextPageToken = None
while True:
request = youtube.search().list(
pageToken = nextPageToken,
part = 'id, snippet',
type = 'video',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
response = request.execute()
nextPageToken = response['nextPageToken']
items = response['items']
if response['nextPageToken'] == None:
break
for each_item in items:
video_id = each_item['id']['videoId']
sub_items = each_item['snippet']
for sub_item in sub_items:
video_item[sub_item] = sub_items[sub_item ]
video_data[video_id] = video_item
except Exception as e:
print('Error in get_video_data: {0}'.format(e))
谢谢!
请确认您对 Search.list
端点的 API 调用是 运行 针对那一年的整套 YouTube 视频时期;您的 API 调用未指定任何其他过滤条件,这意味着您的查询(基于分页)将 可能 return 数百万个视频条目 .
如果您实际上是在寻找自己的视频,那么您的 Search.list
端点调用应包含 forMine
or the channelId
请求参数:
- 当您从
discovery.build
method using its parameter credentials
(that is you're issuing an authorized request), then use the request parameter forMine
构建 youtube
对象时,如下所示:
request = youtube.search().list(
forMine = True,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
请注意,根据下面 更新和修复部分下记录的调查结果,此替代方案被证明是不可行的。
- 当您从
discovery.build
method using its parameter developerKey
(that is you're not issuing an authorized request), then use the request parameter channelId
构建 youtube
对象时,如下所示:
request = youtube.search().list(
channelId = CHANNEL_ID,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
请注意,CHANNEL_ID
是您的频道(或与此相关的任何其他频道)的 ID。
上述两种 API 调用的区别如下:发出授权请求时(上面第一个项目符号),您将获得您频道的所有视频,包括那些非public(即那些将 privacyStatus
set to private
or unlisted
); on the other hand, when using an API key (the second bullet above), you'll get only the public videos (i.e. those that have their privacyStatus
设置为 public
的频道),即使 CHANNEL_ID
是您自己频道的 ID。
现在,不幸的是,您上面的代码还有另一个问题:您的两个 Search.list
端点调用不相同,取模 pageToken
请求参数。那是因为第二次调用没有拿到请求参数publishedAfter
和publishedBefore
.
这种差异意味着您没有正确分页第一个 API 调用的结果集(实际上,即使将参数 pageToken
传递给第二个 API 调用) .
幸运的是,您正在使用的 Google 的 APIs 客户端库 Python 实现了 API result set pagination in a simple pythonic way(我将在下面举例说明上面的第二个项目符号):
request = youtube.search().list(
channelId = CHANNEL_ID,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
video_data = {}
while request:
response = request.execute()
for item in response['items']:
video_id = item['id']['videoId']
video_item = item['snippet']
video_data[video_id] = video_item
request = youtube.search().list_next(
request, response)
上面的代码表明没有必要完全重复第一个 API 调用,添加一个 pageToken
参数;有更简单的语句就足够了:
request = youtube.search().list_next(
request, response)
此语句使用 response
对象的 nextPageToken
属性 的值从旧的 request
对象构造一个具有正确设置的新对象 pageToken
属性.
更新和修复
在进一步测试和调查关于使用请求参数 forMine
、publishedAfter
和 publishedBefore
调用 Search.list
后,我得出以下结论结论:
没有任何参数 publishedAfter
和 publishedBefore
的参数 forMine=True
使 API 调用按预期工作;
参数 forMine=True
与任何参数 publishedAfter
和 publishedBefore
或两者一起给出会产生 HTTP 错误 400 Bad Request
以及JSON 错误响应:
{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"errors": [
{
"message": "Request contains an invalid argument.",
"domain": "global",
"reason": "badRequest"
}
],
"status": "INVALID_ARGUMENT"
}
}
Google 自己的问题跟踪器记录 a very recent bug report that describes precisely the behavior above. The official response from Google's staff 如下:
Status: Won't Fix (Intended Behavior)
This is working as intended. Basically you can only set one of the resource filters if it's a for_content_owner request, but both channel ID and published after are resource filters. This requirement doesn't seem to be specified on the developer website: https://developers.google.com/youtube/v3/docs/search/list.
我尝试下载 2019 年的视频元数据。每次我 运行 我的代码都超过配额限制。那段时间我有不到 100 个视频。谁能告诉我更好的代码编写方法?
try:
request = youtube.search().list(
part = 'id, snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
pageToken = None,
maxResults = 50
)
response = request.execute()
nextPageToken = None
while True:
request = youtube.search().list(
pageToken = nextPageToken,
part = 'id, snippet',
type = 'video',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
response = request.execute()
nextPageToken = response['nextPageToken']
items = response['items']
if response['nextPageToken'] == None:
break
for each_item in items:
video_id = each_item['id']['videoId']
sub_items = each_item['snippet']
for sub_item in sub_items:
video_item[sub_item] = sub_items[sub_item ]
video_data[video_id] = video_item
except Exception as e:
print('Error in get_video_data: {0}'.format(e))
谢谢!
请确认您对 Search.list
端点的 API 调用是 运行 针对那一年的整套 YouTube 视频时期;您的 API 调用未指定任何其他过滤条件,这意味着您的查询(基于分页)将 可能 return 数百万个视频条目 .
如果您实际上是在寻找自己的视频,那么您的 Search.list
端点调用应包含 forMine
or the channelId
请求参数:
- 当您从
discovery.build
method using its parametercredentials
(that is you're issuing an authorized request), then use the request parameterforMine
构建youtube
对象时,如下所示:
request = youtube.search().list(
forMine = True,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
请注意,根据下面 更新和修复部分下记录的调查结果,此替代方案被证明是不可行的。
- 当您从
discovery.build
method using its parameterdeveloperKey
(that is you're not issuing an authorized request), then use the request parameterchannelId
构建youtube
对象时,如下所示:
request = youtube.search().list(
channelId = CHANNEL_ID,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
请注意,CHANNEL_ID
是您的频道(或与此相关的任何其他频道)的 ID。
上述两种 API 调用的区别如下:发出授权请求时(上面第一个项目符号),您将获得您频道的所有视频,包括那些非public(即那些将 privacyStatus
set to private
or unlisted
); on the other hand, when using an API key (the second bullet above), you'll get only the public videos (i.e. those that have their privacyStatus
设置为 public
的频道),即使 CHANNEL_ID
是您自己频道的 ID。
现在,不幸的是,您上面的代码还有另一个问题:您的两个 Search.list
端点调用不相同,取模 pageToken
请求参数。那是因为第二次调用没有拿到请求参数publishedAfter
和publishedBefore
.
这种差异意味着您没有正确分页第一个 API 调用的结果集(实际上,即使将参数 pageToken
传递给第二个 API 调用) .
幸运的是,您正在使用的 Google 的 APIs 客户端库 Python 实现了 API result set pagination in a simple pythonic way(我将在下面举例说明上面的第二个项目符号):
request = youtube.search().list(
channelId = CHANNEL_ID,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
video_data = {}
while request:
response = request.execute()
for item in response['items']:
video_id = item['id']['videoId']
video_item = item['snippet']
video_data[video_id] = video_item
request = youtube.search().list_next(
request, response)
上面的代码表明没有必要完全重复第一个 API 调用,添加一个 pageToken
参数;有更简单的语句就足够了:
request = youtube.search().list_next(
request, response)
此语句使用 response
对象的 nextPageToken
属性 的值从旧的 request
对象构造一个具有正确设置的新对象 pageToken
属性.
更新和修复
在进一步测试和调查关于使用请求参数 forMine
、publishedAfter
和 publishedBefore
调用 Search.list
后,我得出以下结论结论:
没有任何参数
publishedAfter
和publishedBefore
的参数forMine=True
使 API 调用按预期工作;参数
forMine=True
与任何参数publishedAfter
和publishedBefore
或两者一起给出会产生 HTTP 错误400 Bad Request
以及JSON 错误响应:
{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"errors": [
{
"message": "Request contains an invalid argument.",
"domain": "global",
"reason": "badRequest"
}
],
"status": "INVALID_ARGUMENT"
}
}
Google 自己的问题跟踪器记录 a very recent bug report that describes precisely the behavior above. The official response from Google's staff 如下:
Status: Won't Fix (Intended Behavior)
This is working as intended. Basically you can only set one of the resource filters if it's a for_content_owner request, but both channel ID and published after are resource filters. This requirement doesn't seem to be specified on the developer website: https://developers.google.com/youtube/v3/docs/search/list.