如果 类 不同并且包含不同的内容,我如何从 类 中提取内容并按时间顺序将它们添加到列表中?
How do I extract content from classess and add them chronologically to a list if the classes are different and contain different content?
在抓取代码时,我有 2 种情况需要以不同方式处理。
2 个相似的 类 都包含建筑物的价格,需要按时间顺序添加到 excel,因为它们必须与我正在抓取的其他数据相匹配。
我正在抓取数据的属性有 2 个不同 类。
一个看起来像这样:
<div class="xl-price rangePrice">
375.000 €
</div>
另一个是这样的:
<div class="xl-price-promotion rangePrice">
<span>from </span> 250.000 € <br><span>to</span> 695.000 €
</div>
我的代码能够提取其中一个,但不能同时提取两者。
我需要它做的是浏览搜索结果页面上的所有价格,并将它们附加到列表 "pricelist"。
我对平方米、建筑类型等做同样的事情,并将每个列表项输入一个 excel 文件。
因此,按时间顺序将它们添加到列表中至关重要,因为如果不是,结果是价格 excel 中的行位置将与正方形的行位置不匹配仪表和建筑类型。
有谁知道为什么我的代码无法提取两者 类?
这是我的代码和我试图从中提取价格的页面:
获取网站并循环浏览前 4 页:
for number in range(1, 4):
listplace = (number - 1) * len(buildinglist1)
immo_page = requests.get(f'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={number}',
headers=header)
soup = Beautiful
Soup(immo_page.content, 'lxml') # html parser
pricelist = ['Price']
for item in soup.findAll('div', attrs={'class': 'xl-price'}):
# item = item.text.strip().split()
try:
for item in soup.findAll('div', attrs={'class': 'xl-price-promotion rangePrice'}):
temp_list = []
item = item.text.strip().split()
item.remove('from'), item.remove('€'), item.remove('to'), item.remove('€')
for price in item: temp_list.append(price.replace('.', ''))
print(temp_list)
temp_list = [int(temp_list[0]) + int(temp_list[1])]
print(temp_list)
for item in temp_list: pricelist.append(item / 2)
except ValueError:
for item in soup.findAll('div', attrs={'class': 'xl-price rangePrice'}):
item = item.contents[0]
item = item.strip()[0:-1]
item = item.replace(' ', '')
item = item.replace('.', '')
pricelist.append(item)
print(pricelist)
这就是我试图获取价格并将其附加到列表中的方法。
仅使用两者之一时的输出(在本例中,我显示了在 "Except" 值中运行的代码的输出:
['Price', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000']
['Price', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000']
['Price', '235000']
每"Price"表示一个新页面。但正如您在第 3 页中看到的那样,它并不完整,仅显示它遇到的第一个值,即单一价格,但不采用双倍价格值。
- 当价格超过 1 个时,我会取该价格的平均值,然后将其附加到价目表中。
非常感谢!
此脚本从页面 1
到 10
抓取数据并将它们保存为 csv 文件。平均价格(如果为广告找到多个):
import re
import csv
import requests
from bs4 import BeautifulSoup
from statistics import mean
url = 'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={}'
data = []
for page in range(1, 10):
soup = BeautifulSoup(requests.get(url.format(page)).text, 'html.parser')
for result, price, surface, desc, link in zip( soup.select('.title-bar-left'),
soup.select('.rangePrice'),
soup.select('.xl-surface-ch, .l-surface-ch, .m-surface-ch'),
soup.select('.xl-desc, .l-desc, .m-desc'),
soup.select('.result-xl > a[target="IWEB_MAIN"], .result-l > a[target="IWEB_MAIN"], .result-m > a[target="IWEB_MAIN"]') ):
s = (re.findall('\s*(.*?m²)\s*', surface.get_text(strip=True)) or '-')[0]
bed = (re.findall('\s*([\s\d\-]+bed.)\s*', surface.get_text(strip=True)) or '-')[0]
old_price = price.select_one('.old-price')
if old_price:
old_price.extract()
price = mean( [int(''.join(re.findall(r'\d+', v))) for v in re.findall(r'\s*(.*?)\s*€', price.text)] )
data.append([result.get_text(strip=True),
price,
s, bed, desc.get_text(strip=True)])
print('{:<65} {:<10} {:<20} {:<20} {:<70}'.format(*data[-1]))
data[-1] += [link['href']]
with open('output.csv', 'w') as f_out:
writer = csv.writer(f_out, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerows(data)
打印:
Apartment 275000 70 m² 2 bed. energiezuinig app, hartje Leuven, 2 slpk, fietsenstalling
Apartment 298000 84 m² 2 bed. App. 2 slpk in de unieke residentie Keizershof!
Apartment 535000 80 m² 2 bed. appartement
Flat/Studio 145000 32 m² 1 bed. studio
Flat/Studio 159000 22 m² 1 bed. studio
Apartment 487000 149 m² 3 bed. Modern spatious apartment within the ring of Leuven
Flat/Studio 189000 30 m² 1 bed. flat
Apartment 325000 75 m² 2 bed. appartement
Flat/Studio 139000 23 m² 1 bed. studio
Apartment 499000 104 m² 2 bed. appartement
Apartment 249500 95 m² 2 bed. appartement
... and so on.
LibreOffice Calc 中的文件如下所示:
import requests
from bs4 import BeautifulSoup
import csv
types = []
sqs = []
prices = []
des = []
links = []
for url in range(1, 11):
print(f"Extracting Page# {url}")
r = requests.get(
f"https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={url}")
soup = BeautifulSoup(r.text, 'html.parser')
for ty in soup.findAll('div', attrs={'class': 'title-bar-left'}):
ty = ty.text.strip()
types.append(ty)
for sq in soup.select('div[class*="surface-ch"]'):
sq = sq.text.strip()
if 'm²' in sq:
sq = sq[0:sq.find('m')]
else:
sq = 'N/A'
sqs.append(sq)
for price in soup.select('div[class*="-price"]'):
price = price.get_text(strip=True)
if 'from' in price:
price = price.replace('from', 'From: ')
price = price.replace('to', ' To: ')
else:
price = price[0:price.find('€') + 1]
prices.append(price)
for de in soup.select('div[class*="-desc"]'):
de = de.get_text(strip=True)
des.append(de)
for url in soup.findAll('a'):
url = url.get('href')
if url is not None and 'for-sale/leuven/3000/id' in url:
links.append(url)
final = []
for item in zip(types, sqs, prices, des, links):
final.append(item)
with open('output.csv', 'w+', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Type', 'Size', 'Price', 'Desc', 'Link'])
writer.writerows(final)
print("Operation Completed")
在线查看输出:Click Here
截图:
在抓取代码时,我有 2 种情况需要以不同方式处理。 2 个相似的 类 都包含建筑物的价格,需要按时间顺序添加到 excel,因为它们必须与我正在抓取的其他数据相匹配。
我正在抓取数据的属性有 2 个不同 类。 一个看起来像这样:
<div class="xl-price rangePrice">
375.000 €
</div>
另一个是这样的:
<div class="xl-price-promotion rangePrice">
<span>from </span> 250.000 € <br><span>to</span> 695.000 €
</div>
我的代码能够提取其中一个,但不能同时提取两者。 我需要它做的是浏览搜索结果页面上的所有价格,并将它们附加到列表 "pricelist"。
我对平方米、建筑类型等做同样的事情,并将每个列表项输入一个 excel 文件。
因此,按时间顺序将它们添加到列表中至关重要,因为如果不是,结果是价格 excel 中的行位置将与正方形的行位置不匹配仪表和建筑类型。
有谁知道为什么我的代码无法提取两者 类?
这是我的代码和我试图从中提取价格的页面:
获取网站并循环浏览前 4 页:
for number in range(1, 4):
listplace = (number - 1) * len(buildinglist1)
immo_page = requests.get(f'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={number}',
headers=header)
soup = Beautiful
Soup(immo_page.content, 'lxml') # html parser
pricelist = ['Price']
for item in soup.findAll('div', attrs={'class': 'xl-price'}):
# item = item.text.strip().split()
try:
for item in soup.findAll('div', attrs={'class': 'xl-price-promotion rangePrice'}):
temp_list = []
item = item.text.strip().split()
item.remove('from'), item.remove('€'), item.remove('to'), item.remove('€')
for price in item: temp_list.append(price.replace('.', ''))
print(temp_list)
temp_list = [int(temp_list[0]) + int(temp_list[1])]
print(temp_list)
for item in temp_list: pricelist.append(item / 2)
except ValueError:
for item in soup.findAll('div', attrs={'class': 'xl-price rangePrice'}):
item = item.contents[0]
item = item.strip()[0:-1]
item = item.replace(' ', '')
item = item.replace('.', '')
pricelist.append(item)
print(pricelist)
这就是我试图获取价格并将其附加到列表中的方法。
仅使用两者之一时的输出(在本例中,我显示了在 "Except" 值中运行的代码的输出:
['Price', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000']
['Price', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000']
['Price', '235000']
每"Price"表示一个新页面。但正如您在第 3 页中看到的那样,它并不完整,仅显示它遇到的第一个值,即单一价格,但不采用双倍价格值。
- 当价格超过 1 个时,我会取该价格的平均值,然后将其附加到价目表中。
非常感谢!
此脚本从页面 1
到 10
抓取数据并将它们保存为 csv 文件。平均价格(如果为广告找到多个):
import re
import csv
import requests
from bs4 import BeautifulSoup
from statistics import mean
url = 'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={}'
data = []
for page in range(1, 10):
soup = BeautifulSoup(requests.get(url.format(page)).text, 'html.parser')
for result, price, surface, desc, link in zip( soup.select('.title-bar-left'),
soup.select('.rangePrice'),
soup.select('.xl-surface-ch, .l-surface-ch, .m-surface-ch'),
soup.select('.xl-desc, .l-desc, .m-desc'),
soup.select('.result-xl > a[target="IWEB_MAIN"], .result-l > a[target="IWEB_MAIN"], .result-m > a[target="IWEB_MAIN"]') ):
s = (re.findall('\s*(.*?m²)\s*', surface.get_text(strip=True)) or '-')[0]
bed = (re.findall('\s*([\s\d\-]+bed.)\s*', surface.get_text(strip=True)) or '-')[0]
old_price = price.select_one('.old-price')
if old_price:
old_price.extract()
price = mean( [int(''.join(re.findall(r'\d+', v))) for v in re.findall(r'\s*(.*?)\s*€', price.text)] )
data.append([result.get_text(strip=True),
price,
s, bed, desc.get_text(strip=True)])
print('{:<65} {:<10} {:<20} {:<20} {:<70}'.format(*data[-1]))
data[-1] += [link['href']]
with open('output.csv', 'w') as f_out:
writer = csv.writer(f_out, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerows(data)
打印:
Apartment 275000 70 m² 2 bed. energiezuinig app, hartje Leuven, 2 slpk, fietsenstalling
Apartment 298000 84 m² 2 bed. App. 2 slpk in de unieke residentie Keizershof!
Apartment 535000 80 m² 2 bed. appartement
Flat/Studio 145000 32 m² 1 bed. studio
Flat/Studio 159000 22 m² 1 bed. studio
Apartment 487000 149 m² 3 bed. Modern spatious apartment within the ring of Leuven
Flat/Studio 189000 30 m² 1 bed. flat
Apartment 325000 75 m² 2 bed. appartement
Flat/Studio 139000 23 m² 1 bed. studio
Apartment 499000 104 m² 2 bed. appartement
Apartment 249500 95 m² 2 bed. appartement
... and so on.
LibreOffice Calc 中的文件如下所示:
import requests
from bs4 import BeautifulSoup
import csv
types = []
sqs = []
prices = []
des = []
links = []
for url in range(1, 11):
print(f"Extracting Page# {url}")
r = requests.get(
f"https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={url}")
soup = BeautifulSoup(r.text, 'html.parser')
for ty in soup.findAll('div', attrs={'class': 'title-bar-left'}):
ty = ty.text.strip()
types.append(ty)
for sq in soup.select('div[class*="surface-ch"]'):
sq = sq.text.strip()
if 'm²' in sq:
sq = sq[0:sq.find('m')]
else:
sq = 'N/A'
sqs.append(sq)
for price in soup.select('div[class*="-price"]'):
price = price.get_text(strip=True)
if 'from' in price:
price = price.replace('from', 'From: ')
price = price.replace('to', ' To: ')
else:
price = price[0:price.find('€') + 1]
prices.append(price)
for de in soup.select('div[class*="-desc"]'):
de = de.get_text(strip=True)
des.append(de)
for url in soup.findAll('a'):
url = url.get('href')
if url is not None and 'for-sale/leuven/3000/id' in url:
links.append(url)
final = []
for item in zip(types, sqs, prices, des, links):
final.append(item)
with open('output.csv', 'w+', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Type', 'Size', 'Price', 'Desc', 'Link'])
writer.writerows(final)
print("Operation Completed")
在线查看输出:Click Here
截图: