如何提取网页中嵌入的 vedio link 名称以及 vedio 名称
How to pull the vedio link name imbeded in a web page along with vedio name
我正在尝试从网页中提取所有视频 link 参考以及视频名称,我已尝试使用以下代码。
#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
import urllib
url = urllib.request.urlopen('https://www.ansible.com/resources/videos').read()
acc_link = BeautifulSoup(url, features="lxml")
for line in acc_link.find_all('a'):
print(line.get('href'))
输出:
https://www.ansible.com/?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/how-ansible-works?hsLang=en-us
https://www.ansible.com/products/automation-platform?hsLang=en-us
https://www.ansible.com/use-cases?hsLang=en-us
https://www.ansible.com/use-cases/provisioning?hsLang=en-us
https://www.ansible.com/use-cases/configuration-management?hsLang=en-us
https://www.ansible.com/use-cases/application-deployment?hsLang=en-us
https://www.ansible.com/use-cases/continuous-delivery?hsLang=en-us
https://www.ansible.com/use-cases/security-automation?hsLang=en-us
https://www.ansible.com/use-cases/orchestration?hsLang=en-us
https://www.ansible.com/integrations?hsLang=en-us
HTML源码举例:
<h4><a href="https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista?hsLang=en-us">Ansible Network Automation with Arista CloudVision and Arista Validated Designs</a></h4>
上面只是 link https://www.ansible.com/resources/videos 的 HTML 源代码的一个示例,我希望 link 名称为 https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista
和视频名称 Ansible Network Automation with Arista CloudVision and Arista Validated Designs
.
下面只是另一个例子,我希望 href
在 ?
和 a
值之前,即 Scale-out Clustering with Tower 3.1
.
<h4><a href="https://www.ansible.com/scale-out-clustering-tower?hsLang=en-us">Scale-out Clustering with Tower 3.1</a></h4>
期望输出:
视频名称: Ansible Network Automation with Arista CloudVision 和 Arista Validated Designs
感谢先进的帮助。
如果你想要来自所有锚点的 href
那么你可以使用 css select 'a[href]'
它只会找到具有 href
的锚标签属性:
你确实做了如下调整,
#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
import urllib
url = urllib.request.urlopen('https://www.ansible.com/resources/videos').read()
acc_link = BeautifulSoup(url, features="lxml")
for article in acc_link.find_all('div', class_='card-body'):
# this will grab the name of the video article
headline1 = article.h4.a.text
# this will get your video link
headline2 = article.select_one('a[href]')['href'].split('?')[0]
#headline2 = headline2.split('?')[0]
print(headline1)
# I have placed the condition as few of the link address do not have
# the site link prefix www.ansible.com.
if 'www' in headline2:
print(headline2)
else:
print('https://www.ansible.com' + headline2)
print()
结果:
Automating Monitoring with the Sensu Go Ansible Collection
https://www.ansible.com/resources/webinars-training/automating-monitoring-with-the-sensu-go-ansible-collection
How to load balance a hybrid cloud using Red Hat Insights, Red Hat Ansible, and Red Hat AMQ Interconnect
https://www.redhat.com/en/about/videos/road-to-open-hybrid-cloud-part-2
British Army speeds service delivery with Red Hat
https://www.redhat.com/en/about/videos/british-army-speeds-service-delivery-red-hat
Zero To 100 - Rapid deployment with Ansible Tower
https://www.ansible.com/zero-to-100
Scale-out Clustering with Tower 3.1
https://www.ansible.com/scale-out-clustering-tower
What's New In Tower 3.1
https://www.ansible.com/whats-new-tower-3-1
Amelco - Continuous Delivery with Ansible Tower
https://www.ansible.com/success-stories/amelco
Runnable - Getting Started with Ansible
https://www.ansible.com/success-stories/runnable
Fatmap - App Deployment with Ansible
https://www.ansible.com/success-stories/fatmap
Splunk and Ansible Tower
https://www.ansible.com/success-stories/splunk
Siemens - Delivering Automation to the Cloud
https://www.ansible.com/success-stories/siemens
Ansible Tower 10 min demo
https://www.ansible.com/products/tower/demo
Ansible Tower 3.1
https://www.ansible.com/tower-workflows-demo
Ansible Tower 2-min Overview
https://www.ansible.com/tower-overview
Ansible Quick Start
https://www.ansible.com/resources/videos/quick-start-video
Ansible + AWS - Serverless Deploys
https://www.ansible.com/resources/videos/ansible-aws-automate-serverless-application-deploys-with-ansible
Ansible + AWS - EC2 Provisionling
https://www.ansible.com/resources/videos/ansible-aws-automate-ec2-provisioning-with-red-hat-ansible-engine-and-red-hat-ansible-tower
Network Automation For Beginners
https://www.ansible.com/resources/videos/network-automation-with-red-hat-ansible-engine-for-beginners
Agnostic Network Automation Examples with Ansible and Juniper NRE Labs
https://www.ansible.com/blog/agnostic-network-automation-examples-with-ansible-and-juniper-nre-labs
How useful is Ansible in a cloud-native Kubernetes environment
https://www.ansible.com/blog/how-useful-is-ansible-in-a-cloud-native-kubernetes-environment
希望对您有所帮助。
我正在尝试从网页中提取所有视频 link 参考以及视频名称,我已尝试使用以下代码。
#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
import urllib
url = urllib.request.urlopen('https://www.ansible.com/resources/videos').read()
acc_link = BeautifulSoup(url, features="lxml")
for line in acc_link.find_all('a'):
print(line.get('href'))
输出:
https://www.ansible.com/?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/how-ansible-works?hsLang=en-us
https://www.ansible.com/products/automation-platform?hsLang=en-us
https://www.ansible.com/use-cases?hsLang=en-us
https://www.ansible.com/use-cases/provisioning?hsLang=en-us
https://www.ansible.com/use-cases/configuration-management?hsLang=en-us
https://www.ansible.com/use-cases/application-deployment?hsLang=en-us
https://www.ansible.com/use-cases/continuous-delivery?hsLang=en-us
https://www.ansible.com/use-cases/security-automation?hsLang=en-us
https://www.ansible.com/use-cases/orchestration?hsLang=en-us
https://www.ansible.com/integrations?hsLang=en-us
HTML源码举例:
<h4><a href="https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista?hsLang=en-us">Ansible Network Automation with Arista CloudVision and Arista Validated Designs</a></h4>
上面只是 link https://www.ansible.com/resources/videos 的 HTML 源代码的一个示例,我希望 link 名称为 https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista
和视频名称 Ansible Network Automation with Arista CloudVision and Arista Validated Designs
.
下面只是另一个例子,我希望 href
在 ?
和 a
值之前,即 Scale-out Clustering with Tower 3.1
.
<h4><a href="https://www.ansible.com/scale-out-clustering-tower?hsLang=en-us">Scale-out Clustering with Tower 3.1</a></h4>
期望输出:
视频名称: Ansible Network Automation with Arista CloudVision 和 Arista Validated Designs
感谢先进的帮助。
如果你想要来自所有锚点的 href
那么你可以使用 css select 'a[href]'
它只会找到具有 href
的锚标签属性:
你确实做了如下调整,
#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
import urllib
url = urllib.request.urlopen('https://www.ansible.com/resources/videos').read()
acc_link = BeautifulSoup(url, features="lxml")
for article in acc_link.find_all('div', class_='card-body'):
# this will grab the name of the video article
headline1 = article.h4.a.text
# this will get your video link
headline2 = article.select_one('a[href]')['href'].split('?')[0]
#headline2 = headline2.split('?')[0]
print(headline1)
# I have placed the condition as few of the link address do not have
# the site link prefix www.ansible.com.
if 'www' in headline2:
print(headline2)
else:
print('https://www.ansible.com' + headline2)
print()
结果:
Automating Monitoring with the Sensu Go Ansible Collection
https://www.ansible.com/resources/webinars-training/automating-monitoring-with-the-sensu-go-ansible-collection
How to load balance a hybrid cloud using Red Hat Insights, Red Hat Ansible, and Red Hat AMQ Interconnect
https://www.redhat.com/en/about/videos/road-to-open-hybrid-cloud-part-2
British Army speeds service delivery with Red Hat
https://www.redhat.com/en/about/videos/british-army-speeds-service-delivery-red-hat
Zero To 100 - Rapid deployment with Ansible Tower
https://www.ansible.com/zero-to-100
Scale-out Clustering with Tower 3.1
https://www.ansible.com/scale-out-clustering-tower
What's New In Tower 3.1
https://www.ansible.com/whats-new-tower-3-1
Amelco - Continuous Delivery with Ansible Tower
https://www.ansible.com/success-stories/amelco
Runnable - Getting Started with Ansible
https://www.ansible.com/success-stories/runnable
Fatmap - App Deployment with Ansible
https://www.ansible.com/success-stories/fatmap
Splunk and Ansible Tower
https://www.ansible.com/success-stories/splunk
Siemens - Delivering Automation to the Cloud
https://www.ansible.com/success-stories/siemens
Ansible Tower 10 min demo
https://www.ansible.com/products/tower/demo
Ansible Tower 3.1
https://www.ansible.com/tower-workflows-demo
Ansible Tower 2-min Overview
https://www.ansible.com/tower-overview
Ansible Quick Start
https://www.ansible.com/resources/videos/quick-start-video
Ansible + AWS - Serverless Deploys
https://www.ansible.com/resources/videos/ansible-aws-automate-serverless-application-deploys-with-ansible
Ansible + AWS - EC2 Provisionling
https://www.ansible.com/resources/videos/ansible-aws-automate-ec2-provisioning-with-red-hat-ansible-engine-and-red-hat-ansible-tower
Network Automation For Beginners
https://www.ansible.com/resources/videos/network-automation-with-red-hat-ansible-engine-for-beginners
Agnostic Network Automation Examples with Ansible and Juniper NRE Labs
https://www.ansible.com/blog/agnostic-network-automation-examples-with-ansible-and-juniper-nre-labs
How useful is Ansible in a cloud-native Kubernetes environment
https://www.ansible.com/blog/how-useful-is-ansible-in-a-cloud-native-kubernetes-environment
希望对您有所帮助。