按名称、美汤和 python 获取元标记内容

Get meta tag content by name, beautiful soup and python

我正在尝试从该网站获取元数据(这是代码)。

import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.svpboston.com/').text

soup = BeautifulSoup(source, features="html.parser")

title = soup.find("meta", name="description")
image = soup.find("meta", name="og:image")

print(title["content"] if title else "No meta title given")
print(image["content"]if title else "No meta title given")

但是我得到这个错误。

Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/Work/Web Scraping/Selenium/sadsaddas.py", line 9, in <module>
    title = soup.find("meta", name="description")
TypeError: find() got multiple values for argument 'name'

有什么想法吗?

find() 只接受一个参数。改用这个:

meta = soup.findall("meta")
title = meta.find(name="description")
image = meta.find(name="og:image")

你可以这样试试

title = soup.find("meta", attrs={"name":"description"})
image = soup.find("meta", attrs={"name":"og:image"})
print(title)
print(image)
print(title["content"] if title else "No meta title given")
print(image["content"] if image else "No meta for image given")

title = soup.find("meta", property="og:title")
print(title["content"] if title else "No meta title given")

来自bs4 docs:

You can't use a keyword argument to search for HTML’s name element, because Beautiful Soup uses the name argument to contain the name of the tag itself. Instead, you can give a value to ‘name’ in the attrs argument

要通过特定属性获取标签,我建议您将其放入字典并将该字典作为 attrs 参数传递给 .find()。但是您也传递了错误的属性来获取标题和图像。您应该使用 property=<...> 而不是 name=<...> 来获取 meta 标签。以下是获得所需内容的最终代码:

import requests
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.svpboston.com/').text

soup = BeautifulSoup(source, features="html.parser")

title = soup.find("meta", attrs={'property': 'og:title'})
image = soup.find("meta", attrs={'property': 'og:image'})

print(title["content"] if title is not None else "No meta title given")
print(image["content"] if title is not None else "No meta title given")