requests_html returns 黑色
requests_html returns black
Python 3.8.2 (default, Apr 8 2020, 14:31:25)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> r = session.get('https://www.sahibinden.com/ilan/emlak-konut-gunluk-kiralik-holiday-business-suit-lux-otel-konforunda-suit-daireler-803346031/detay')
>>> r.html.find("#classifiedId")
[]
我运行这段代码但是输出是空的。我尝试了 r.html.render()
但结果没有改变。我也尝试用 xpath 找到它,但仍然没有结果。我该如何解决?
如果浏览器没有要求您打开或不打开,您可以使用selenium打开您的系统浏览器,直接访问该元素。
第一个:
您需要安装 chrome 或 firefox 驱动程序
第二个:
需要使用 pip
在 python 中安装 selenium
第三名:
在您的文本编辑器中或ide编写这些代码
from selenium import webdriver
from time import sleep
driver = webdriver.FireFox(executable_path="your driver path in your pc or laptop")
driver.get("your url")
sleep(10)
elem = driver.find_element_by_xpath("use xpath here")
print(elem.text)
driver.close()
站点需要您指定 User-Agent
和一个名为 "s3IssGuY1"
的 cookie。我不知道这个 cookie 是否需要随时间(以及何时)更改,但您可以相应地更改它(来自 Firefox/Chrome 开发人员工具):
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
url = 'https://www.sahibinden.com/ilan/emlak-konut-gunluk-kiralik-holiday-business-suit-lux-otel-konforunda-suit-daireler-803346031/detay'
cookies = {'s3IssGuY1': 'A_Lne-ByAQAAWJ_crjFYgFyVj0loVQQA3jwlYwVTH-vnpfLSbIkEJkwRS9NDAVX4a-mcuNvjwH8AADQwAAAAAA=='}
soup = BeautifulSoup(requests.get(url, headers=headers, cookies=cookies).content, 'html.parser')
for st, sp in zip(soup.select('.classifiedInfoList strong'), soup.select('.classifiedInfoList span')):
print('{:<30} {}'.format(st.get_text(strip=True), sp.get_text(strip=True)))
打印:
İlan No 803346031
İlan Tarihi 23 Haziran 2020
Emlak Tipi Günlük Kiralık Daire
m² (Brüt) 35
m² (Net) 30
Oda Sayısı Stüdyo (1+0)
Bulunduğu Kat 5
Kat Sayısı 5
Isıtma Merkezi
Banyo Sayısı 1
Site İçerisinde Hayır
Kimden Emlak Ofisinden
Python 3.8.2 (default, Apr 8 2020, 14:31:25)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> r = session.get('https://www.sahibinden.com/ilan/emlak-konut-gunluk-kiralik-holiday-business-suit-lux-otel-konforunda-suit-daireler-803346031/detay')
>>> r.html.find("#classifiedId")
[]
我运行这段代码但是输出是空的。我尝试了 r.html.render()
但结果没有改变。我也尝试用 xpath 找到它,但仍然没有结果。我该如何解决?
如果浏览器没有要求您打开或不打开,您可以使用selenium打开您的系统浏览器,直接访问该元素。
第一个:
您需要安装 chrome 或 firefox 驱动程序
第二个:
需要使用 pip
在 python 中安装 selenium第三名:
在您的文本编辑器中或ide编写这些代码
from selenium import webdriver
from time import sleep
driver = webdriver.FireFox(executable_path="your driver path in your pc or laptop")
driver.get("your url")
sleep(10)
elem = driver.find_element_by_xpath("use xpath here")
print(elem.text)
driver.close()
站点需要您指定 User-Agent
和一个名为 "s3IssGuY1"
的 cookie。我不知道这个 cookie 是否需要随时间(以及何时)更改,但您可以相应地更改它(来自 Firefox/Chrome 开发人员工具):
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
url = 'https://www.sahibinden.com/ilan/emlak-konut-gunluk-kiralik-holiday-business-suit-lux-otel-konforunda-suit-daireler-803346031/detay'
cookies = {'s3IssGuY1': 'A_Lne-ByAQAAWJ_crjFYgFyVj0loVQQA3jwlYwVTH-vnpfLSbIkEJkwRS9NDAVX4a-mcuNvjwH8AADQwAAAAAA=='}
soup = BeautifulSoup(requests.get(url, headers=headers, cookies=cookies).content, 'html.parser')
for st, sp in zip(soup.select('.classifiedInfoList strong'), soup.select('.classifiedInfoList span')):
print('{:<30} {}'.format(st.get_text(strip=True), sp.get_text(strip=True)))
打印:
İlan No 803346031
İlan Tarihi 23 Haziran 2020
Emlak Tipi Günlük Kiralık Daire
m² (Brüt) 35
m² (Net) 30
Oda Sayısı Stüdyo (1+0)
Bulunduğu Kat 5
Kat Sayısı 5
Isıtma Merkezi
Banyo Sayısı 1
Site İçerisinde Hayır
Kimden Emlak Ofisinden