根据视频 ID 列表返回 YouTube 视频的时间戳
Returning the time stamps of youtube videos based on a list of video ids
You can run my code in this google colab file --> https://colab.research.google.com/drive/1Tfoa5y13GPLxbS-wFNmZpvtQDogyh1Rg?usp=sharing
所以我编写了一个脚本,它采用 YouTube 视频的 VideoID,例如:
VideoID = '3c584TGG7jQ'
基于此 VideoID 我的脚本 returns 带有 youtube 文字记录(视频内容)的词典列表,例如:
data = [{'text': 'Hello World', 'start': 0.19, 'duration': 4.21}, ...]
最后我写了一个函数,它接受用户的输入,即你想要搜索的 word/sentence 和函数 returns 带有相应超链接的时间戳。
def search_dictionary(user_input, dictionary):
MY_CODE_SEE_GOOGLE_COLAB_NOTEBOOK
search_dictionary(user_input, dictionary)
Input: "stolen"
Output:
the 2 million packages that are stolen... 0.0 min und 39.0 sec :: https://youtu.be/3c584TGG7jQ?t=38s
stolen and the fifth is this outer... 3.0 min und 13.0 sec :: https://youtu.be/3c584TGG7jQ?t=192s
我的问题来了。我如何将其应用于 video_ids 的列表?例如
list_of_video_ids = ['pXDx6DjNLDU', '8HEfIJlcFbs', '3c584TGG7jQ', ...]
预期输出:
Title_0, timestamp, hyperlink
Title_0, timestamp, hyperlink
Title_1, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
因此所有 video_ids 中的每一次提及,而不仅仅是一个 video_id
我已经检查了你的代码,你只是需要更多的时间和测试。
像我一样,你需要将transcript.fetch()
的结果附加到一个全局变量——每次循环list_of_video_ids
的元素时,你可以——在search_dictionary
您创建的函数,迭代脚本。
这是主要代码:
# Get user input here:
# N.B: You should validate for avoid a blank line or some invalid input...
user_input = input("Enter a word or sentence: ")
user_input = user_input.lower()
# We use here the global list "all_transcripts":
dictionary = all_transcripts
# Function to loop all transcripts and search the captions thath contains the
# user input.
# TO-DO: Validate when no data is found.
def search_dictionary(user_input, dictionary):
link = 'https://youtu.be/'
# Get the video_id:
v_id = ""
# I add here the debbuged results:
lst_results = []
# string body:
matched_line = ""
# You're really looping a list of dictionaries:
for i in range(len(dictionary)): # <= this is really a "list".
try:
#print(type(dictionary[i])) # <= this is really a "dictionary".
#print(dictionary[i])
# now you can iterate here the "dictionary":
for x, y in dictionary[i].items():
#print(x, y)
if (x == "video_id"):
v_id = y
if (user_input in str(y) and len(v_id) > 0):
matched_line = str(dictionary[i]['text']) + '...' + str(dictionary[i]['start']) + ' min und ' + str(dictionary[i]['duration']) + ' sec :: ' + link + v_id + '?t=' + str(int(dictionary[i]['start'] - 1)) + 's'
#matched_line = "text: " + y + " -- found in video_id = " + v_id
# Check if line does not exists in the list of results:
if len(lst_results) == 0:
lst_results.append(matched_line)
if matched_line not in lst_results:
lst_results.append(matched_line)
except Exception as err:
print('Unexpected error - see details bellow:')
print(err)
# Just an example for show "no results":
if (len(lst_results) == 0):
print("No results found with input (" + user_input + ")")
else:
print("Results: ")
print("\n".join(lst_results)) # <= this is for show the results with a line break.
# Function ends here.
# Call function:
search_dictionary(user_input, dictionary)
# Show message - indicating end of the program - just informative :)
print("End of the program")
按照这个问题的思路,我修改了你的代码,这是你的Google Colab file modified.
的link
这是Google Colab public notebook link.
代码恢复如下:
- 您的变量命名需要更改,因为 - 在测试时,我在理解我正在处理的数据类型时遇到问题 =
lists
或 dictionaries
,似乎有 both
= 正如您在阅读修改后的代码时所看到的那样。
- 我建议您组织代码并关注间距 - Google Colab 中的行太长而无法阅读 - 这 可能 是一些个人的不过偏好。
- 正如您在我对您的代码所做的注释中看到的那样,我鼓励您在您的代码中添加注释 - 以帮助其他人理解您的代码。
要测试此代码并查看它是否适用于此修改后的代码,请尝试输入 teach
:
结果如下:
Enter a word or sentence: teach
Results:
teacher and set up a class or even...626.0 min und 4.079 sec :: https://youtu.be/pXDx6DjNLDU?t=625s
teach this process and where you watch...738.399 min und 3.68 sec :: https://youtu.be/8HEfIJlcFbs?t=737s
few times a year i teach a month-long...418.8 min und 3.44 sec :: https://youtu.be/3c584TGG7jQ?t=417s
End of the program
You can run my code in this google colab file --> https://colab.research.google.com/drive/1Tfoa5y13GPLxbS-wFNmZpvtQDogyh1Rg?usp=sharing
所以我编写了一个脚本,它采用 YouTube 视频的 VideoID,例如:
VideoID = '3c584TGG7jQ'
基于此 VideoID 我的脚本 returns 带有 youtube 文字记录(视频内容)的词典列表,例如:
data = [{'text': 'Hello World', 'start': 0.19, 'duration': 4.21}, ...]
最后我写了一个函数,它接受用户的输入,即你想要搜索的 word/sentence 和函数 returns 带有相应超链接的时间戳。
def search_dictionary(user_input, dictionary):
MY_CODE_SEE_GOOGLE_COLAB_NOTEBOOK
search_dictionary(user_input, dictionary)
Input: "stolen"
Output:
the 2 million packages that are stolen... 0.0 min und 39.0 sec :: https://youtu.be/3c584TGG7jQ?t=38s
stolen and the fifth is this outer... 3.0 min und 13.0 sec :: https://youtu.be/3c584TGG7jQ?t=192s
我的问题来了。我如何将其应用于 video_ids 的列表?例如
list_of_video_ids = ['pXDx6DjNLDU', '8HEfIJlcFbs', '3c584TGG7jQ', ...]
预期输出:
Title_0, timestamp, hyperlink
Title_0, timestamp, hyperlink
Title_1, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
因此所有 video_ids 中的每一次提及,而不仅仅是一个 video_id
我已经检查了你的代码,你只是需要更多的时间和测试。
像我transcript.fetch()
的结果附加到一个全局变量——每次循环list_of_video_ids
的元素时,你可以——在search_dictionary
您创建的函数,迭代脚本。
这是主要代码:
# Get user input here:
# N.B: You should validate for avoid a blank line or some invalid input...
user_input = input("Enter a word or sentence: ")
user_input = user_input.lower()
# We use here the global list "all_transcripts":
dictionary = all_transcripts
# Function to loop all transcripts and search the captions thath contains the
# user input.
# TO-DO: Validate when no data is found.
def search_dictionary(user_input, dictionary):
link = 'https://youtu.be/'
# Get the video_id:
v_id = ""
# I add here the debbuged results:
lst_results = []
# string body:
matched_line = ""
# You're really looping a list of dictionaries:
for i in range(len(dictionary)): # <= this is really a "list".
try:
#print(type(dictionary[i])) # <= this is really a "dictionary".
#print(dictionary[i])
# now you can iterate here the "dictionary":
for x, y in dictionary[i].items():
#print(x, y)
if (x == "video_id"):
v_id = y
if (user_input in str(y) and len(v_id) > 0):
matched_line = str(dictionary[i]['text']) + '...' + str(dictionary[i]['start']) + ' min und ' + str(dictionary[i]['duration']) + ' sec :: ' + link + v_id + '?t=' + str(int(dictionary[i]['start'] - 1)) + 's'
#matched_line = "text: " + y + " -- found in video_id = " + v_id
# Check if line does not exists in the list of results:
if len(lst_results) == 0:
lst_results.append(matched_line)
if matched_line not in lst_results:
lst_results.append(matched_line)
except Exception as err:
print('Unexpected error - see details bellow:')
print(err)
# Just an example for show "no results":
if (len(lst_results) == 0):
print("No results found with input (" + user_input + ")")
else:
print("Results: ")
print("\n".join(lst_results)) # <= this is for show the results with a line break.
# Function ends here.
# Call function:
search_dictionary(user_input, dictionary)
# Show message - indicating end of the program - just informative :)
print("End of the program")
按照这个问题的思路,我修改了你的代码,这是你的Google Colab file modified.
的link这是Google Colab public notebook link.
代码恢复如下:
- 您的变量命名需要更改,因为 - 在测试时,我在理解我正在处理的数据类型时遇到问题 =
lists
或dictionaries
,似乎有both
= 正如您在阅读修改后的代码时所看到的那样。 - 我建议您组织代码并关注间距 - Google Colab 中的行太长而无法阅读 - 这 可能 是一些个人的不过偏好。
- 正如您在我对您的代码所做的注释中看到的那样,我鼓励您在您的代码中添加注释 - 以帮助其他人理解您的代码。
要测试此代码并查看它是否适用于此修改后的代码,请尝试输入 teach
:
结果如下:
Enter a word or sentence: teach
Results:
teacher and set up a class or even...626.0 min und 4.079 sec :: https://youtu.be/pXDx6DjNLDU?t=625s
teach this process and where you watch...738.399 min und 3.68 sec :: https://youtu.be/8HEfIJlcFbs?t=737s
few times a year i teach a month-long...418.8 min und 3.44 sec :: https://youtu.be/3c584TGG7jQ?t=417s
End of the program