Python-请求使用 BS4 问题抓取 YouTube 描述
Python-Requests Scraping YouTube description with BS4 issue
我正在尝试获取图片中显示的文本和链接。但我只能通过兄弟姐妹和后面的链接获得文本。我需要他们像图片中那样聚在一起。我尝试使用 br.next_element 但它没有获取 a 链接。我错过了什么?
import requests
from bs4 import BeautifulSoup
url_id = 'aM7aW0G58CI'
s = requests.Session()
r = s.get('https://www.youtube.com/watch?v='+url_id)
html = r.text
soup = BeautifulSoup(html, 'lxml')
for i in soup.find_all('p', id='eow-description'):
for br in i.find_all('br'):
next_sib = br.next_sibling
print(next_sib)
for i in soup.find_all('p', id='eow-description'):
for a in i.find_all('a'):
print(a.text)
这是我得到的输出。我不明白下面的屏幕截图显示的内容。
输出:
Special shout to
Wanna support what we do? Livestream at 2PM PT!:
It Wasn’t Me, I Swear!:
TheDeFrancoFam Vlog:
————————————
CATCH UP ON THIS WEEK’S SHOWS:
<br/>
Why People Are Freaking Out About The Trump NFL Boycott and Anthony Weiner Going to Jail…:
WOW! Dirty Advertising Exposed And Major Backlash Following Unexpected Compromise…:
Why Trump's "HUGE Failure" Is A Massive Loss For His Enemies and A Shocking Change To Women's Rights:
DISGUSTING! The Horrible Truth About Belle Gibson Exposed, Controversial Video Blows Up, and More:
<br/>
————————————
GET SOME GEAR:
————————————
FACEBOOK:
TWITTER:
INSTAGRAM:
SNAPCHAT: TheDeFrancoFam
REDDIT:
ITUNES:
GOOGLE PLAY:
————————————
Edited by:
James Girardier -
Jason Mayer -
<br/>
Produced by:
Amanda Morones -
<br/>
Motion Graphics Artist:
Brian Borst -
<br/>
P.O. BOX
Attn: Philip DeFranco
16350 Ventura Blvd
Ste D #542
Encino, CA 91436
http://DKPhil.com
http://DeFrancoElite.com
https://youtu.be/fFxDbYE06zU
https://youtu.be/kR7DquGe4vY
https://youtu.be/qdWUQGHtyPk
https://youtu.be/CWlUs1-7KN4
https://youtu.be/kUWt-oipvOY
https://youtu.be/XVsTh4zxKNo
https://teespring.com/stores/defranco...
http://on.fb.me/mqpRW7
http://Twitter.com/PhillyD
https://instagram.com/phillydefranco/
https://www.reddit.com/r/DeFranco
http://DeFrancoMistakes.com
http://mistakeswithdefranco.com
https://twitter.com/jamesgirardier
https://www.instagram.com/jayjaymay/
https://twitter.com/MandaOhDang
https://twitter.com/brianjborst
使用 children
并检查 tag
名称 (child.name
) 我制作了
import requests
from bs4 import BeautifulSoup
url_id = 'aM7aW0G58CI'
s = requests.Session()
r = s.get('https://www.youtube.com/watch?v='+url_id)
soup = BeautifulSoup(r.text, 'lxml')
# to concatenate <br>
br = ''
for p in soup.find_all('p', id='eow-description'):
for child in p.children:
if child.name == 'a':
#print(' a:', child.text)
print(br, child.text)
br = '' # reset br
elif child.name == 'br':
if child.next_sibling.name != 'br': # skip <br/> ?
#print('br:', child.next_sibling)
br += str(child.next_sibling)
#else:
# print(child.name, child)
我得到:
Special shout to http://DKPhil.com
Wanna support what we do? Livestream at 2PM PT!: http://DeFrancoElite.com
It Wasn’t Me, I Swear!: https://youtu.be/fFxDbYE06zU
TheDeFrancoFam Vlog: https://youtu.be/kR7DquGe4vY
———————————— CATCH UP ON THIS WEEK’S SHOWS: Why People Are Freaking Out About The Trump NFL Boycott and Anthony Weiner Going to Jail…: https://youtu.be/qdWUQGHtyPk
WOW! Dirty Advertising Exposed And Major Backlash Following Unexpected Compromise…: https://youtu.be/CWlUs1-7KN4
Why Trump's "HUGE Failure" Is A Massive Loss For His Enemies and A Shocking Change To Women's Rights: https://youtu.be/kUWt-oipvOY
DISGUSTING! The Horrible Truth About Belle Gibson Exposed, Controversial Video Blows Up, and More: https://youtu.be/XVsTh4zxKNo
————————————GET SOME GEAR: https://teespring.com/stores/defranco...
————————————FACEBOOK: http://on.fb.me/mqpRW7
TWITTER: http://Twitter.com/PhillyD
INSTAGRAM: https://instagram.com/phillydefranco/
SNAPCHAT: TheDeFrancoFamREDDIT: https://www.reddit.com/r/DeFranco
ITUNES: http://DeFrancoMistakes.com
GOOGLE PLAY: http://mistakeswithdefranco.com
————————————Edited by:James Girardier - https://twitter.com/jamesgirardier
Jason Mayer - https://www.instagram.com/jayjaymay/
Produced by:Amanda Morones - https://twitter.com/MandaOhDang
Motion Graphics Artist:Brian Borst - https://twitter.com/brianjborst
编辑: 您可能需要使用
else:
print(child.name, child)
获取邮政信箱地址
我找到了一个非常简单的方法:
for p in soup.find_all('p', id='eow-description'):
print(p.get_text('\n'))
现在唯一的问题是一些链接被删除 ...
您也可以使用 youtube-dl
python 模块来获取 YouTube 视频的描述。
我找到了这个方法..
import pafy
url='https://www.youtube.com/watch?v=aM7aW0G58CI'
vid=pafy.new(url)
print(vid.description)
通过这种方法,您将获得与 Youtube 视频说明中显示的内容完全相同的方式。
我正在尝试获取图片中显示的文本和链接。但我只能通过兄弟姐妹和后面的链接获得文本。我需要他们像图片中那样聚在一起。我尝试使用 br.next_element 但它没有获取 a 链接。我错过了什么?
import requests
from bs4 import BeautifulSoup
url_id = 'aM7aW0G58CI'
s = requests.Session()
r = s.get('https://www.youtube.com/watch?v='+url_id)
html = r.text
soup = BeautifulSoup(html, 'lxml')
for i in soup.find_all('p', id='eow-description'):
for br in i.find_all('br'):
next_sib = br.next_sibling
print(next_sib)
for i in soup.find_all('p', id='eow-description'):
for a in i.find_all('a'):
print(a.text)
这是我得到的输出。我不明白下面的屏幕截图显示的内容。
输出:
Special shout to
Wanna support what we do? Livestream at 2PM PT!:
It Wasn’t Me, I Swear!:
TheDeFrancoFam Vlog:
————————————
CATCH UP ON THIS WEEK’S SHOWS:
<br/>
Why People Are Freaking Out About The Trump NFL Boycott and Anthony Weiner Going to Jail…:
WOW! Dirty Advertising Exposed And Major Backlash Following Unexpected Compromise…:
Why Trump's "HUGE Failure" Is A Massive Loss For His Enemies and A Shocking Change To Women's Rights:
DISGUSTING! The Horrible Truth About Belle Gibson Exposed, Controversial Video Blows Up, and More:
<br/>
————————————
GET SOME GEAR:
————————————
FACEBOOK:
TWITTER:
INSTAGRAM:
SNAPCHAT: TheDeFrancoFam
REDDIT:
ITUNES:
GOOGLE PLAY:
————————————
Edited by:
James Girardier -
Jason Mayer -
<br/>
Produced by:
Amanda Morones -
<br/>
Motion Graphics Artist:
Brian Borst -
<br/>
P.O. BOX
Attn: Philip DeFranco
16350 Ventura Blvd
Ste D #542
Encino, CA 91436
http://DKPhil.com
http://DeFrancoElite.com
https://youtu.be/fFxDbYE06zU
https://youtu.be/kR7DquGe4vY
https://youtu.be/qdWUQGHtyPk
https://youtu.be/CWlUs1-7KN4
https://youtu.be/kUWt-oipvOY
https://youtu.be/XVsTh4zxKNo
https://teespring.com/stores/defranco...
http://on.fb.me/mqpRW7
http://Twitter.com/PhillyD
https://instagram.com/phillydefranco/
https://www.reddit.com/r/DeFranco
http://DeFrancoMistakes.com
http://mistakeswithdefranco.com
https://twitter.com/jamesgirardier
https://www.instagram.com/jayjaymay/
https://twitter.com/MandaOhDang
https://twitter.com/brianjborst
使用 children
并检查 tag
名称 (child.name
) 我制作了
import requests
from bs4 import BeautifulSoup
url_id = 'aM7aW0G58CI'
s = requests.Session()
r = s.get('https://www.youtube.com/watch?v='+url_id)
soup = BeautifulSoup(r.text, 'lxml')
# to concatenate <br>
br = ''
for p in soup.find_all('p', id='eow-description'):
for child in p.children:
if child.name == 'a':
#print(' a:', child.text)
print(br, child.text)
br = '' # reset br
elif child.name == 'br':
if child.next_sibling.name != 'br': # skip <br/> ?
#print('br:', child.next_sibling)
br += str(child.next_sibling)
#else:
# print(child.name, child)
我得到:
Special shout to http://DKPhil.com
Wanna support what we do? Livestream at 2PM PT!: http://DeFrancoElite.com
It Wasn’t Me, I Swear!: https://youtu.be/fFxDbYE06zU
TheDeFrancoFam Vlog: https://youtu.be/kR7DquGe4vY
———————————— CATCH UP ON THIS WEEK’S SHOWS: Why People Are Freaking Out About The Trump NFL Boycott and Anthony Weiner Going to Jail…: https://youtu.be/qdWUQGHtyPk
WOW! Dirty Advertising Exposed And Major Backlash Following Unexpected Compromise…: https://youtu.be/CWlUs1-7KN4
Why Trump's "HUGE Failure" Is A Massive Loss For His Enemies and A Shocking Change To Women's Rights: https://youtu.be/kUWt-oipvOY
DISGUSTING! The Horrible Truth About Belle Gibson Exposed, Controversial Video Blows Up, and More: https://youtu.be/XVsTh4zxKNo
————————————GET SOME GEAR: https://teespring.com/stores/defranco...
————————————FACEBOOK: http://on.fb.me/mqpRW7
TWITTER: http://Twitter.com/PhillyD
INSTAGRAM: https://instagram.com/phillydefranco/
SNAPCHAT: TheDeFrancoFamREDDIT: https://www.reddit.com/r/DeFranco
ITUNES: http://DeFrancoMistakes.com
GOOGLE PLAY: http://mistakeswithdefranco.com
————————————Edited by:James Girardier - https://twitter.com/jamesgirardier
Jason Mayer - https://www.instagram.com/jayjaymay/
Produced by:Amanda Morones - https://twitter.com/MandaOhDang
Motion Graphics Artist:Brian Borst - https://twitter.com/brianjborst
编辑: 您可能需要使用
else:
print(child.name, child)
获取邮政信箱地址
我找到了一个非常简单的方法:
for p in soup.find_all('p', id='eow-description'):
print(p.get_text('\n'))
现在唯一的问题是一些链接被删除 ...
您也可以使用 youtube-dl
python 模块来获取 YouTube 视频的描述。
我找到了这个方法..
import pafy
url='https://www.youtube.com/watch?v=aM7aW0G58CI'
vid=pafy.new(url)
print(vid.description)
通过这种方法,您将获得与 Youtube 视频说明中显示的内容完全相同的方式。