如何抓取没有唯一 ID 的字符串以进行数据提取?
How to scrape string that does not have unique ID for data extraction?
图片中有一个名为 for sale in 63702 Kolaram
的文本。
请说明如何使用 BeautifulSoup
Python
.
提取该字符串
https://www.magicbricks.com/property-for-sale-in-namakkal-pppfs
您可以简单地使用 beautifulsoap
的 find_all() 函数来完成此操作
import requests
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
result_text = soup.find_all(text=your_text_which_you_want_to_find)
这将return一个包含您想使用 bs4 查找的文本的列表
如果只需要文本,可以使用指向跨度元素的选择器:
from bs4 import BeautifulSoup
import requests
import re
url = 'https://www.magicbricks.com/property-for-sale-in-namakkal-pppfs'
request = requests.get(url)
soup = BeautifulSoup(request.content, 'html5lib')
spans = [re.sub('[\s]+', ' ', re.sub('[\n\r\t]*', '', span.text)).strip() for span in soup.select('div.m-srp-card__container > div.m-srp-card__desc > div.m-srp-card__heading > h2 > span.m-srp-card__title')]
print(spans)
这将 return 文本列表:
['3 BHK Villa for Sale in 637201 Kolaram', '1 BHK House for Sale in Rasipuram', '2 BHK House for Sale in Rasipuram', 'Plot/Land for Sale in Thiruchengode Rd', 'Plot/Land for Sale in Mathiyampatty', 'Plot/Land for Sale in Namagiripetai', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Ladhuvaadi', 'Plot/Land for Sale in TGP Star City', 'Plot/Land for Sale in Palpakki', 'Plot/Land for Sale in Pavai International school back side', 'Plot/Land for Sale in Rasipuram', '3 BHK Villa for Sale in Kondichettipatti', '2 BHK House for Sale in Thiruchengode Rd', '1 BHK House for Sale in Ganesapuram namakkal', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Pon nagar bus stop', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Rasipuram', '2 BHK House for Sale in Periyapatti', '2 BHK House for Sale in Tiruchengode', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in L.Kanavaipatti Road', 'Plot/Land for Sale in Muthugapatti', 'Plot/Land for Sale in Kondichettipatti', 'Plot/Land for Sale in Rasipuram', '2 BHK House for Sale in Sai Brindhavan nagar Neare', 'Plot/Land for Sale in sri sowrnabairavar nagar', '2 BHK House for Sale in Namagiripettai', '3 BHK House for Sale in Pachal']
如果每张卡片都需要一些其他信息,减少选择器并迭代 div
个元素,从它们的子元素中提取所需的信息。
图片中有一个名为 for sale in 63702 Kolaram
的文本。
请说明如何使用 BeautifulSoup
Python
.
https://www.magicbricks.com/property-for-sale-in-namakkal-pppfs
您可以简单地使用 beautifulsoap
的 find_all() 函数来完成此操作import requests
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
result_text = soup.find_all(text=your_text_which_you_want_to_find)
这将return一个包含您想使用 bs4 查找的文本的列表
如果只需要文本,可以使用指向跨度元素的选择器:
from bs4 import BeautifulSoup
import requests
import re
url = 'https://www.magicbricks.com/property-for-sale-in-namakkal-pppfs'
request = requests.get(url)
soup = BeautifulSoup(request.content, 'html5lib')
spans = [re.sub('[\s]+', ' ', re.sub('[\n\r\t]*', '', span.text)).strip() for span in soup.select('div.m-srp-card__container > div.m-srp-card__desc > div.m-srp-card__heading > h2 > span.m-srp-card__title')]
print(spans)
这将 return 文本列表:
['3 BHK Villa for Sale in 637201 Kolaram', '1 BHK House for Sale in Rasipuram', '2 BHK House for Sale in Rasipuram', 'Plot/Land for Sale in Thiruchengode Rd', 'Plot/Land for Sale in Mathiyampatty', 'Plot/Land for Sale in Namagiripetai', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Ladhuvaadi', 'Plot/Land for Sale in TGP Star City', 'Plot/Land for Sale in Palpakki', 'Plot/Land for Sale in Pavai International school back side', 'Plot/Land for Sale in Rasipuram', '3 BHK Villa for Sale in Kondichettipatti', '2 BHK House for Sale in Thiruchengode Rd', '1 BHK House for Sale in Ganesapuram namakkal', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Pon nagar bus stop', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Rasipuram', '2 BHK House for Sale in Periyapatti', '2 BHK House for Sale in Tiruchengode', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in L.Kanavaipatti Road', 'Plot/Land for Sale in Muthugapatti', 'Plot/Land for Sale in Kondichettipatti', 'Plot/Land for Sale in Rasipuram', '2 BHK House for Sale in Sai Brindhavan nagar Neare', 'Plot/Land for Sale in sri sowrnabairavar nagar', '2 BHK House for Sale in Namagiripettai', '3 BHK House for Sale in Pachal']
如果每张卡片都需要一些其他信息,减少选择器并迭代 div
个元素,从它们的子元素中提取所需的信息。