使用 python 和 selenium 从主页提取所有 youtube 视频 url

Extract all youtube video urls from homepage using python and selenium

我正在寻找构建一个 Youtube 推荐抓取工具,它抓取 Youtube 主页以寻找 youtube 视频 ids/links 以便稍后使用 youtube-dl 下载。但是,我不知道 how/where 是否真正获得了这些信息。

我尝试的代码如下:

from selenium import webdriver

driver = webdriver.Chrome('./chromedriver/chromedriver')
driver.get("https://www.youtube.com")

while True:
    data = driver.find_elements_by_xpath("?")
    for i in data:
        l = i.get_attribute('href') #Should obtain some of the links/ids on the page but is None...

您的选择与任何元素都不匹配。通过简单地查看我的 youtube 首页的 html-source,我注意到,每个包含视频的元素都是 id 'thumbnail' 的 a-tag,它也具有直接属性 'href':

鉴于此,您可以通过这个确切的 id 找到元素并从中提取给定的属性“href”并通过简单的列表理解来过滤它,如下所示:

driver.get("https://www.youtube.com")
hrefs = [video.get_attribute('href') for video in driver.find_elements_by_id("thumbnail")]

for href in hrefs:
    print(href)

输出:

https://www.youtube.com/watch?v=PcYxbxXJhcc
https://www.youtube.com/watch?v=oTL52-NvyE4
https://www.youtube.com/watch?v=8kVI621fZug
https://www.youtube.com/watch?v=Pr9TdbTDMH0
https://www.youtube.com/watch?v=iL9upp5jahg
https://www.youtube.com/watch?v=iWnb3IqCfgc
https://www.youtube.com/watch?v=ehAwNw4xDRM
https://www.youtube.com/watch?v=PzVj7s4JZhE
https://www.youtube.com/watch?v=7fBdqdqRxFM
https://www.youtube.com/watch?v=WMweEpGlu_U
https://www.youtube.com/watch?v=2ljGwsbRLaI
https://www.youtube.com/watch?v=aUgEPebvR2Q
https://www.youtube.com/watch?v=Gh6ovYtD2Q8
https://www.youtube.com/watch?v=dVICcSLIHCM
https://www.youtube.com/watch?v=bl6mPR5t6Dk
https://www.youtube.com/watch?v=mMKXCfTDjvg
https://www.youtube.com/watch?v=z_HhNWNm_jo
https://www.youtube.com/watch?v=ZtiqfY8fixU
https://www.youtube.com/watch?v=9eAcRFlXxgo
https://www.youtube.com/watch?v=omC2eg-d-6Q
https://www.youtube.com/watch?v=E90SOw7fIVk
https://www.youtube.com/watch?v=5qap5aO4i9A
https://www.youtube.com/watch?v=T3ua3xTfbFI
https://www.youtube.com/watch?v=DTvS9lvRxZ8

在抓取之前始终分析目标源的 html 结构,然后选择最适合查找数据的内容。