根据视频 ID 列表返回 YouTube 视频的时间戳

Question

You can run my code in this google colab file --> https://colab.research.google.com/drive/1Tfoa5y13GPLxbS-wFNmZpvtQDogyh1Rg?usp=sharing

所以我编写了一个脚本，它采用 YouTube 视频的 VideoID，例如：

VideoID = '3c584TGG7jQ'

基于此 VideoID 我的脚本 returns 带有 youtube 文字记录（视频内容）的词典列表，例如：

data = [{'text': 'Hello World', 'start': 0.19, 'duration': 4.21}, ...]

最后我写了一个函数，它接受用户的输入，即你想要搜索的 word/sentence 和函数 returns 带有相应超链接的时间戳。

def search_dictionary(user_input, dictionary):
        MY_CODE_SEE_GOOGLE_COLAB_NOTEBOOK


search_dictionary(user_input, dictionary)

Input: "stolen"

Output: 
the 2 million packages that are stolen... 0.0 min und 39.0 sec :: https://youtu.be/3c584TGG7jQ?t=38s
stolen and the fifth is this outer... 3.0 min und 13.0 sec :: https://youtu.be/3c584TGG7jQ?t=192s

我的问题来了。我如何将其应用于 video_ids 的列表？例如

list_of_video_ids = ['pXDx6DjNLDU', '8HEfIJlcFbs', '3c584TGG7jQ', ...]

预期输出：

Title_0, timestamp, hyperlink
Title_0, timestamp, hyperlink
Title_1, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink

因此所有 video_ids 中的每一次提及，而不仅仅是一个 video_id

Answer 1

我已经检查了你的代码，你只是需要更多的时间和测试。

像我一样，你需要将transcript.fetch()的结果附加到一个全局变量——每次循环list_of_video_ids的元素时，你可以——在search_dictionary 您创建的函数，迭代脚本。

这是主要代码：

# Get user input here: 
# N.B: You should validate for avoid a blank line or some invalid input...
user_input = input("Enter a word or sentence: ")
user_input = user_input.lower()

# We use here the global list "all_transcripts": 
dictionary = all_transcripts

# Function to loop all transcripts and search the captions thath contains the 
# user input.
# TO-DO: Validate when no data is found.
def search_dictionary(user_input, dictionary): 
  link = 'https://youtu.be/'

  # Get the video_id: 
  v_id  = ""

  # I add here the debbuged results: 
  lst_results = []

  # string body:
  matched_line = ""

  # You're really looping a list of dictionaries: 
  for i in range(len(dictionary)): # <= this is really a "list".
    try:
      #print(type(dictionary[i])) # <= this is really a "dictionary".
      #print(dictionary[i])

      # now you can iterate here the "dictionary": 
      for x, y in dictionary[i].items():
        #print(x, y)
        if (x == "video_id"): 
          v_id = y
        if (user_input in str(y) and len(v_id) > 0):
          matched_line = str(dictionary[i]['text']) + '...' + str(dictionary[i]['start']) + ' min und ' + str(dictionary[i]['duration']) + ' sec :: ' + link + v_id + '?t=' + str(int(dictionary[i]['start'] - 1)) + 's'
          #matched_line = "text: " + y + " -- found in video_id = " + v_id
          
          # Check if line does not exists in the list of results: 
          if len(lst_results) == 0:
            lst_results.append(matched_line)
          if matched_line not in lst_results: 
            lst_results.append(matched_line)

    except Exception as err: 
      print('Unexpected error - see details bellow:')
      print(err)

  # Just an example for show "no results":
  if (len(lst_results) == 0):
    print("No results found with input (" + user_input + ")")
  else: 
    print("Results: ")
    print("\n".join(lst_results)) # <= this is for show the results with a line break.
# Function ends here.

# Call function: 
search_dictionary(user_input, dictionary) 

# Show message - indicating end of the program - just informative :)
print("End of the program")

按照这个问题的思路，我修改了你的代码，这是你的Google Colab file modified.

的link

这是Google Colab public notebook link.

代码恢复如下：

您的变量命名需要更改，因为 - 在测试时，我在理解我正在处理的数据类型时遇到问题 = lists 或 dictionaries，似乎有 both = 正如您在阅读修改后的代码时所看到的那样。
我建议您组织代码并关注间距 - Google Colab 中的行太长而无法阅读 - 这可能是一些个人的不过偏好。
正如您在我对您的代码所做的注释中看到的那样，我鼓励您在您的代码中添加注释 - 以帮助其他人理解您的代码。

要测试此代码并查看它是否适用于此修改后的代码，请尝试输入 teach:

结果如下：

Enter a word or sentence: teach
Results: 
teacher and set up a class or even...626.0 min und 4.079 sec :: https://youtu.be/pXDx6DjNLDU?t=625s
teach this process and where you watch...738.399 min und 3.68 sec :: https://youtu.be/8HEfIJlcFbs?t=737s
few times a year i teach a month-long...418.8 min und 3.44 sec :: https://youtu.be/3c584TGG7jQ?t=417s
End of the program

根据视频 ID 列表返回 YouTube 视频的时间戳

Returning the time stamps of youtube videos based on a list of video ids

python

youtube

youtube-api

python-3.x