BeautifulSoup.find_all(),我可以 select 多个标签和这些标签中的字符串吗?
BeautifulSoup.find_all(), can I select multiple tags and strings within those tags?
我想从网站上抓取一些数据。前言,我是一个新手。我希望根据邮政编码专门过滤所有 XML 数据 return(邮政编码在 'item_teaser'
下)。
<item lat="43.6437075296758" long="-80.083111524582" item_name="Acton Golf Club" item_url="http://www.ontariogolf.com/courses/acton/acton-gc/" item_teaser="4955 Dublin Line Acton, Ontario L7J 2M2"/>
以上是我要提取的示例,但我想通过特定邮政区域(前 3 个字母,例如 L7J)过滤所有内容
可以 find_all()
通过 item_teaser
找到关联的字符串,例如“L7J、L2S、L2O 等”。和 return 那些匹配的邮区,包括整个项目?
下面的代码是错误的,因为我不能拉任何东西,但它是我目前拥有的。
from bs4 import BeautifulSoup
url = "http://www.ontariogolf.com/app/map/cfeed.php?e=-63&w=-106&n=55&s=36"
xml = requests.get(url)
# I was just seeing if I could grab everything from the website which worked when I printed.
soup = BeautifulSoup(xml.content, 'lxml')
# I am trying to show all item teasers just to try it out, but I can't seem to figure it out
tag = soup.find_all(id="item_teaser")
print(tag)
当你在做的时候:
tag = soup.find_all(id="item_teaser")
BeautifulSoup
正在寻找名为“item_teaser”的 HTML ID。但是,“item_teaser”不是 id,而是 attribute。
要搜索所有 item-teaser
,您可以将该标签作为 关键字参数 传递给 BeautifulSoup
:
for tag in soup.find_all(item_teaser=True):
print(tag)
此外,要访问 item-teaser
的 属性 ,您可以使用标签 [<attribute>]
for tag in soup.find_all(item_teaser=True):
print(tag["item_teaser"])
您可以检查多个字符串 [matches list
] 是否存在于另一个字符串 [attribute
with name = "item_teaser
"]
from bs4 import BeautifulSoup
import requests
url = "http://www.ontariogolf.com/app/map/cfeed.php?e=-63&w=-106&n=55&s=36"
xml = requests.get(url)
soup = BeautifulSoup(xml.content, 'lxml')
input_tag = soup.find_all('item')
# put the list of associated strings here
matches = ["L7J", "L1S", "L2A"]
# print the result
for tag in input_tag:
text= tag["item_teaser"]
if any(x in text for x in matches):
print(text)
我想从网站上抓取一些数据。前言,我是一个新手。我希望根据邮政编码专门过滤所有 XML 数据 return(邮政编码在 'item_teaser'
下)。
<item lat="43.6437075296758" long="-80.083111524582" item_name="Acton Golf Club" item_url="http://www.ontariogolf.com/courses/acton/acton-gc/" item_teaser="4955 Dublin Line Acton, Ontario L7J 2M2"/>
以上是我要提取的示例,但我想通过特定邮政区域(前 3 个字母,例如 L7J)过滤所有内容
可以 find_all()
通过 item_teaser
找到关联的字符串,例如“L7J、L2S、L2O 等”。和 return 那些匹配的邮区,包括整个项目?
下面的代码是错误的,因为我不能拉任何东西,但它是我目前拥有的。
from bs4 import BeautifulSoup
url = "http://www.ontariogolf.com/app/map/cfeed.php?e=-63&w=-106&n=55&s=36"
xml = requests.get(url)
# I was just seeing if I could grab everything from the website which worked when I printed.
soup = BeautifulSoup(xml.content, 'lxml')
# I am trying to show all item teasers just to try it out, but I can't seem to figure it out
tag = soup.find_all(id="item_teaser")
print(tag)
当你在做的时候:
tag = soup.find_all(id="item_teaser")
BeautifulSoup
正在寻找名为“item_teaser”的 HTML ID。但是,“item_teaser”不是 id,而是 attribute。
要搜索所有 item-teaser
,您可以将该标签作为 关键字参数 传递给 BeautifulSoup
:
for tag in soup.find_all(item_teaser=True):
print(tag)
此外,要访问 item-teaser
的 属性 ,您可以使用标签 [<attribute>]
for tag in soup.find_all(item_teaser=True):
print(tag["item_teaser"])
您可以检查多个字符串 [matches list
] 是否存在于另一个字符串 [attribute
with name = "item_teaser
"]
from bs4 import BeautifulSoup
import requests
url = "http://www.ontariogolf.com/app/map/cfeed.php?e=-63&w=-106&n=55&s=36"
xml = requests.get(url)
soup = BeautifulSoup(xml.content, 'lxml')
input_tag = soup.find_all('item')
# put the list of associated strings here
matches = ["L7J", "L1S", "L2A"]
# print the result
for tag in input_tag:
text= tag["item_teaser"]
if any(x in text for x in matches):
print(text)