如何使用 python 从检索页面标题中删除不需要的文本

how to remove unwanted text from retrieving title of a page using python

大家好,我写了一个 python 程序来检索页面的标题,它工作正常,但对于某些页面,它还会收到一些不需要的文本如何避免这种情况

这是我的程序

# importing the modules
import requests
from bs4 import BeautifulSoup

# target url
url = 'https://atlasobscura.com'

# making requests instance
reqs = requests.get(url)

# using the BeaitifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')

# displaying the title
print("Title of the website is : ")
for title in soup.find_all('title'):
    title_data = title.get_text().lower().strip()
    print(title_data)

这是我的输出

atlas obscura - curious and wondrous travel destinations
aoc-full-screen
aoc-heart-solid
aoc-compass
aoc-flipboard
aoc-globe
aoc-pocket
aoc-share
aoc-cancel
aoc-video
aoc-building
aoc-clock
aoc-clipboard
aoc-help
aoc-arrow-right
aoc-arrow-left
aoc-ticket
aoc-place-entry
aoc-facebook
aoc-instagram
aoc-reddit
aoc-rss
aoc-twitter
aoc-accommodation
aoc-activity-level
aoc-add-a-photo
aoc-add-box
aoc-add-shape
aoc-arrow-forward
aoc-been-here
aoc-chat-bubbles
aoc-close
aoc-expand-more
aoc-expand-less
aoc-forum-flag
aoc-group-size
aoc-heart-outline
aoc-heart-solid
aoc-home
aoc-important
aoc-knife-fork
aoc-library-books
aoc-link
aoc-list-circle-bullets
aoc-list
aoc-location-add
aoc-location
aoc-mail
aoc-map
aoc-menu
aoc-more-horizontal
aoc-my-location
aoc-near-me
aoc-notifications-alert
aoc-notifications-mentions
aoc-notifications-muted
aoc-notifications-tracking
aoc-open-in-new
aoc-pencil
aoc-person
aoc-pinned
aoc-plane-takeoff
aoc-plane
aoc-print
aoc-reply
aoc-search
aoc-shuffle
aoc-star
aoc-subject
aoc-trip-style
aoc-unpinned
aoc-send
aoc-phone
aoc-apps
aoc-lock
aoc-verified

而不是这个我想只收到这一行

"atlas obscura - curious and wondrous travel destinations"

请帮助我了解所有其他网站都可以正常工作只有一些网站出现这些问题

您的问题是您在页面中找到所有出现的“标题”。 Beautiful soup 有一个属性 title 专门针对您要尝试做的事情。这是您修改后的代码:

# importing the modules
import requests
from bs4 import BeautifulSoup

# target url
url = 'https://atlasobscura.com'

# making requests instance
reqs = requests.get(url)

# using the BeaitifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')
title_data = soup.title.text.lower()

# displaying the title
print("Title of the website is : ")
print(title_data)