在脚本标签中抓取数据

Scrape data in script tag

任何人都可以建议一种在 <script> 标签中抓取数据的方法,具体来说,在这种情况下,来自 AEMO (https://www.aemo.com.au/aemo/apps/visualisations/elec-nem-priceanddemand.html) 的 30 分钟 table。

要获取数据 table,我需要单击在网站上显示 table 的按钮或下载按钮。但是,这里的障碍是当我尝试使用 Selenium 抓取它时,table 的按钮和文本隐藏在 <script> 标签后面。

到目前为止,这是我的代码:

# import libraries
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

url = "https://www.aemo.com.au/aemo/apps/visualisations/elec-nem-priceanddemand.html"
browser = webdriver.Safari(executable_path='/usr/bin/safaridriver')
browser.get(url)
try:
    print(browser.page_source)
except:
    print("not found")
finally:
    browser.quit()

部分结果为:

<body aurelia-app="visualisation-main" data-gr-c-s-loaded="true">
    <div class="splash">
      <div class="message"><span class="icon-spinner"></span></div>
    </div>

    <script src="jspm_packages/system.js"></script>
    <script src="config.js"></script>
    <script>
      System.import('aurelia-bootstrapper');
    </script>


</body></html>

Selenium有自己的定位元素的方式,比如find_element_by_css_selector. And often times, browsers need some time to render elements, so you might need to use WebdriverWait.

以下是从页面中提取现货价格的示例:

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

url = 'https://www.aemo.com.au/aemo/apps/visualisations/elec-nem-priceanddemand.html'

browser = webdriver.Chrome()
browser.get(url)

sel = 'body > div > compose > div > compose.fill-height.flex-container.au-target > compose > div > div:nth-child(1) > div'
element = WebDriverWait(browser, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, sel))
)

print(element.text)

结果

.02/MWh