为什么 find_all BeautifulSoup4 函数什么都不返回？

Question

beautiful soup 4 的新手，当我在 YouTube 上搜索内容时，我无法获取这个简单的代码来提取标签的内容。当我打印容器时，它只是将“[]”打印为我假设的空变量。任何想法为什么这没有捡到任何东西？这与没有在 YouTube 上抓取正确的标签有关吗？在搜索 HTML 中，一个结果有以下标记：

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="Kendrick Lamar - HUMBLE. by KendrickLamarVEVO 5 months ago 3 minutes, 4 seconds 322,571,817 views" href="https://www.youtube.com/watch?v=tvTRZJ-4EyI" title="Kendrick Lamar - HUMBLE.">
                Kendrick Lamar - HUMBLE.
              </a>

Python代码：

import bs4

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

search = "damn"
my_url = "https://www.youtube.com/results?search_query=" + search
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

containers = page_soup.find_all("a",{"id":"video-title"})
print(containers)

#result-count

Answer 1

如果您查看 url you cant find any id="video-title" this means that this page is loading the content dynamically. BeautifulSoup does not support dynamic loading by it self. try to combine it with something else like selenium or scrapyjs and also 的源代码可能会有帮助

Answer 2

在 youtube 页面中动态加载结果，因此 id 和类名称将会更改。当您尝试对页面进行解析时，请确保在 urllib 而不是浏览器中加载它时读取页面源代码查看该代码将解决您的问题：

from bs4 import BeautifulSoup as bs
from urllib.request import *
page = urlopen('https://www.youtube.com/results?search_query=damn').read()
soup = bs(page,'html.parser')
results = soup.find_all('a',{'class':'yt-uix-sessionlink'})
for link in results:
    print(l.get("href"))

该代码将在页面中显示所有 url，因此您也应该对其进行解析。

为什么 find_all BeautifulSoup4 函数什么都不返回？

Why is find_all BeautifulSoup4 function returning nothing?

python

web-scraping

beautifulsoup

urllib