使用 BeautifulSoup 抓取新的 YouTube 视频

Question

我是 python 的新手，我想在 YouTube 上进行网络抓取。我想使用这个 link 来上传最新的视频：'https://www.youtube.com/results?search_query=programming&sp=CAISBAgBEAE%253D' 我想抓取新的 5 个视频。我怎样才能做到这一点？我已经使用这段代码对其进行了测试（我只想要

中的 links）

from bs4 import BeautifulSoup
import requests

url="https://www.youtube.com/results?search_query=programming&sp=CAISBAgBEAE%253D"
html = requests.get(url)
soup = BeautifulSoup(html.text, features="html.parser") 

for entry in soup.find_all("entry"):
    for link in entry.find_all("link"):
        print(link["href"])

编辑：我没有收到来自 python 终端的任何响应。它没有刮任何东西。它只有默认的“>>>”。

Answer 1

如果不使用 Google 的 YouTube API 密钥，您就无法抓取 YouTube，您可以通过 these steps 获得该密钥。如果您仍然想尝试，我可以重新发布您问题的合法答案。

同时，尝试在本网站 videvo.net

上使用 beautifulsoup 练习解析

这里有一些代码可以帮助您入门

def get_source(url):
    return BeautifulSoup(requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, verify=False).text, 'html.parser')

soup = get_source('http://videvo.net')

for tags in soup.find_all('a'):
   print(tags['href'])

编辑我的立场得到纠正（稍微）。 Youtube 的主要 url 无法解析。你可以试试这个代码

def get_source(url):
    return BeautifulSoup(requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, verify=False).text, 'html.parser')

soup = get_source('https://www.youtube.com/feeds/videos.xml?user=kinagrannis')

for entry in soup.find_all("entry"):
   for title in entry.find_all("title"):
      print(title.text)
   for link in entry.find_all("link"):
      print(link["href"])
   for name in entry.find_all("name"):
      print(name.text)
   for pub in entry.find_all("published"):
      print(pub.text)

注意：你可以用任何用户名代替'kinnagrannis'，user=[username]

Answer 2

您可以通过以下方式抓取 YouTube：

使用 requests-HTML、playwright 或 selenium 库。
使用正则表达式。
使用来自 SerpApi 的 YouTube 搜索引擎结果 API。

代码（真的很基础只是给个思路）

from requests_html import HTMLSession

session = HTMLSession()
url = "https://www.youtube.com/results?search_query=programming&sp=CAISBAgBEAE%253D"
response = session.get(url)
response.html.render(sleep=1, keep_page = True, scrolldown = 2)

for links in response.html.find('a#video-title'):
    link = next(iter(links.absolute_links))
    print(link)

输出：

https://www.youtube.com/watch?v=OUnxJk3Bphk
https://www.youtube.com/watch?v=vWvtt1ESNeY
https://www.youtube.com/watch?v=b8OIZu5y_Ak
https://www.youtube.com/watch?v=xp3fHaT2_VE
https://www.youtube.com/watch?v=e9toQAcjOrw
https://www.youtube.com/watch?v=em0Is0nyaXA
https://www.youtube.com/watch?v=N5JVTUAGmAM
https://www.youtube.com/watch?v=a0hQG-UdhYc
https://www.youtube.com/watch?v=SmQFxQ1fa2o
https://www.youtube.com/watch?v=uuMS1FYLgWQ
https://www.youtube.com/watch?v=8WJ-zSE32ZY
https://www.youtube.com/watch?v=c5MtH-xDspg
https://www.youtube.com/watch?v=5Xktqz6VUTU
https://www.youtube.com/watch?v=Wbo6j_iq2XY
https://www.youtube.com/watch?v=8eu9nliySO4
https://www.youtube.com/watch?v=j28PjOy_uk8
https://www.youtube.com/watch?v=fM2Ordt8Q9E
https://www.youtube.com/watch?v=tFSkaIVyNno
https://www.youtube.com/watch?v=1hDXlc2C3Rw
https://www.youtube.com/watch?v=vH9_Eo7VW3c

在没有无头浏览器的情况下使用 regex。

您需要到达 var ytInitialData 元素，然后 "commandMetadata" 在那里您会找到 URL 视频 {"url":"/watch?v=Ae2TRkpjRCc",....

这是一个起点，它在 regex101 上抓住了 var ytInitialData 内的所有内容。

或者，您可以使用 YouTube Search Engine Results API from SerpApi. It's a paid API with a free plan. Check out the Playground。

要集成的代码：

from serpapi import GoogleSearch

params = {
  "engine": "youtube",
  "search_query": "programming",
  "sp": "CAISBAgBEAE%253D",
  "api_key": "your_secret_api_key"
}

search = GoogleSearch(params)
results = search.get_dict()

for link in results['video_results']:
    print(f"Title: {link['title']}\nLink: {link['link']}\n")

输出：

Title: CLASS VIII BASIC HTML TAGS AND PROGRAMMING 15 4 101`
Link: https://www.youtube.com/watch?v=KIPp63tXKpU

Title: For loop in c programming #bssdlectureclasses
Link: https://www.youtube.com/watch?v=nfRN0x9VvQc

Title: [C#] Programming NatsukiBot
Link: https://www.youtube.com/watch?v=chnigx-ezwg

Title: CS201 Short Lecture - 03 | VU Short Lecture | Introduction to Programming in (Urdu / Hindi)
Link: https://www.youtube.com/watch?v=qoxXJchd7N4

Title: Programming in C Language - While statement
Link: https://www.youtube.com/watch?v=cl0OpNCdF5I

Title: Introduction to html and Basic programming
Link: https://www.youtube.com/watch?v=A4We3NGqxuA

Title: Use of Printf & Scanf functions | Part 7 | C Programming | PadhoChalo
Link: https://www.youtube.com/watch?v=578xS-Ugc2c

Title: C++ course has started | Computer Programming | Aashu |
Link: https://www.youtube.com/watch?v=SjFgTK2HqbE

Title: Mitsubishi Outlander 2008 prox/twist transponder key programming tip
Link: https://www.youtube.com/watch?v=HlSJcBwxKFQ

Title: Computer Programming 1 -Introduction to the course
Link: https://www.youtube.com/watch?v=xdmPbhTT01g

Title: Programming, Data Structures and Algorithms in Python
Link: https://www.youtube.com/watch?v=0fUddu9cdAU

P.S - 我写了两篇关于如何 Scrape YouTube Search with Python (part 1) and Scrape YouTube Search with Python (part 2) 的博客文章，通过视觉表示更深入地介绍了它。

Disclaimer, I work for SerpApi.

使用 BeautifulSoup 抓取新的 YouTube 视频

Scraping New YouTube Videos With BeautifulSoup

python

youtube

beautifulsoup

web-scraping

python-3.x