有没有办法从日历 js 中提取数据（也许）？ python beautifulsoup selenium post 日历抓取

Question

我想从这个网站的日历中提取数据。 https://www.dreamplus.asia/event/list

如果我单击 evnets 标签或日历中的事件日期。标签的详细信息弹出在日历的右侧。可以看到这个网站是js做的（大概）（如果看到详细页面源码）

即使我使用 selenium 来点击日期或事件的标签，我也找不到如何点击这些东西。有帮助吗？

    # -*- coding: utf-8 -*- 
    import os
    import re
    import json
    import requests
    from bs4 import BeautifulSoup
    import traceback
    from pprint import pprint 
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options

    def dreamplus():    
        options = Options()
        driver = webdriver.Chrome(executable_path='../../chromedriver.exe',         options=options)

        driver.get("https://www.dreamplus.asia/event/list")

        #driver = launchBrowser()
        html = driver.page_source
        soup = BeautifulSoup(html, 'html.parser')
        #Days = driver.find_elements_by_xpath("//*        [@id='calendar']/div[@class='fc-view-container']/div[@class='fc-view fc-month-view fc-basic-view']/table/tbody[@class='fc-body']/tr/td[@class='fc-widget-content']/div[@class='fc-scroller fc-day-grid-container']/div/div/div/table")
        Controllers = driver.find_elements_by_class_name('fc-event-container')
        print(Controllers)
        for list in Controllers:
            print(list.text)

        driver.close()



    if __name__ == '__main__':
        try:
            dreamplus()
        except BaseException as e:
                    with open('dreamplus_error.log','wt') as f:
                            f.write(traceback.format_exc())
                            f.close()

我用find_elements_by_class_name得到了'fc-event-container'得到了物品但是'Controllers'是空的。可能是因为是js..

Answer 1

我观察到，如果您尝试直接转到 event，您会被重定向到主页。因此，您可以转到主页包并单击事件，或者简单地连续执行两个 .get。注意：您希望容器中的子 a 标签用于单击以更新侧边栏信息。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

d = webdriver.Chrome()
d.get('https://www.dreamplus.asia/')
d.get('https://www.dreamplus.asia/event/list')
events = WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".fc-event-container a")))
events[2].click()  #example event click

点击通过（较慢）：

d.get('https://www.dreamplus.asia/')
event_tabs = d.find_elements_by_xpath("//*[contains(text(), 'Event')]")
event_tabs[0].click()
event_tabs[1].click()
events = WebDriverWait(d,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".fc-event-container a")))
events[2].click()  #example event click

有没有办法从日历 js 中提取数据（也许）？ python beautifulsoup selenium post 日历抓取

Is there a way to extract data from calendar js(maybe)? python beautifulsoup selenium post calendar scraping

python

calendar

beautifulsoup

scrapy

web-scraping