BeautifulSoup 未选取元标记
BeautifulSoup not picking up meta tag
我有一个简单的脚本,它获取一个 html 页面并尝试输出关键字元标记的内容。不知何故,即使 html 包含标签,它也不会获取关键字元标签的内容。任何帮助表示赞赏。
url = “https://www.mediapost.com/publications/article/316086/google-facebook-others-pitch-in-app-ads-brand-s.html”
req = urllib2.Request(url=url)
f = urllib2.urlopen(req)
mycontent = f.read()
soup = BeautifulSoup(mycontent, 'html.parser')
keywords = soup.find("meta", property="keywords")
print keywords
使用'lxml'
代替'html.parser'
并使用soup.find_all
soup = BeautifulSoup(doc, 'lxml')
keywords = soup.find_all('meta',attrs={"name": 'keywords'})
for x in keywords:
print(x['content'])
输出
Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018
如果您正确检查它,您要查找的元标记具有属性 name 而不是 属性 所以更改您的代码至
keywords = soup.find("meta", attrs={'name':'keywords'})
然后显示你需要写的内容
print keywords['content']
输出:
Many more major brands are pumping big ad dollars into mobile games,
pushing Google, Facebook and others into the in-app gaming ad space.
Some believe this is in response to brands searching for a secure,
safe place to run video ads and engage with consumers. 03/16/2018
我强烈推荐你requests
。
代码:
from bs4 import BeautifulSoup
import requests
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
keywords = soup.select_one('meta[name="keywords"]')['content']
>>> keywords
'Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018'
我有一个简单的脚本,它获取一个 html 页面并尝试输出关键字元标记的内容。不知何故,即使 html 包含标签,它也不会获取关键字元标签的内容。任何帮助表示赞赏。
url = “https://www.mediapost.com/publications/article/316086/google-facebook-others-pitch-in-app-ads-brand-s.html”
req = urllib2.Request(url=url)
f = urllib2.urlopen(req)
mycontent = f.read()
soup = BeautifulSoup(mycontent, 'html.parser')
keywords = soup.find("meta", property="keywords")
print keywords
使用'lxml'
代替'html.parser'
并使用soup.find_all
soup = BeautifulSoup(doc, 'lxml')
keywords = soup.find_all('meta',attrs={"name": 'keywords'})
for x in keywords:
print(x['content'])
输出
Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018
如果您正确检查它,您要查找的元标记具有属性 name 而不是 属性 所以更改您的代码至
keywords = soup.find("meta", attrs={'name':'keywords'})
然后显示你需要写的内容
print keywords['content']
输出:
Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018
我强烈推荐你requests
。
代码:
from bs4 import BeautifulSoup
import requests
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
keywords = soup.select_one('meta[name="keywords"]')['content']
>>> keywords
'Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018'