Python Selenium 获取数据失败
Python Selenium Failing to Acquire data
我正在尝试从 www1.nseindia.com 下载 24 个月的数据,但在 Chrome 和 Firefox 驱动程序上失败。在所需位置填写所有值后,它只是冻结并且不会单击。网页没有反应...
下面是我要执行的代码:
import time
from selenium import webdriver
from selenium.webdriver.support.ui import Select
id_list = ['ACC', 'ADANIENT']
# Chrome
def EOD_data_Chrome():
driver = webdriver.Chrome(executable_path="C:\Py388\Test\chromedriver.exe")
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
s1= Select(driver.find_element_by_id('dataType'))
s1.select_by_value('priceVolume')
s2= Select(driver.find_element_by_id('series'))
s2.select_by_value('EQ')
s3= Select(driver.find_element_by_id('dateRange'))
s3.select_by_value('24month')
driver.find_element_by_name("symbol").send_keys("ACC")
driver.find_element_by_id("get").click()
time.sleep(9)
s6 = Select(driver.find_element_by_class_name("download-data-link"))
s6.click()
# FireFox(Gecko)
def EOD_data_Gecko():
driver = webdriver.Firefox(executable_path="C:\Py388\Test\geckodriver.exe")
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
s1= Select(driver.find_element_by_id('dataType'))
s1.select_by_value('priceVolume')
s2= Select(driver.find_element_by_id('series'))
s2.select_by_value('EQ')
s3= Select(driver.find_element_by_id('dateRange'))
s3.select_by_value('24month')
driver.find_element_by_name("symbol").send_keys("ACC")
driver.find_element_by_id("get").click()
time.sleep(9)
s6 = Select(driver.find_element_by_class_name("download-data-link"))
s6.click()
EOD_data_Gecko()
# Change the above final line to "EOD_data_Chrome()" and still it just remains stuck...
请帮助解决下载 24 个月数据的代码中缺少的内容...当我在普通浏览器中执行相同操作时,手动点击,成功...
当您在浏览器中手动执行时,您可以更改以下值:
Set first drop down to : Security wise price volume data
"Enter Symbol" : ACC
"Select Series" : EQ
"Period" (radio button: "For Past") : 24 Months
然后单击“获取数据”按钮,大约 3-5 秒后,数据将加载,然后当您单击“以 CSV 格式下载文件”时,您可以在下载中包含 CSV 文件
在 Python 中使用任何你知道的库需要帮助:Selenium、Beautifulsoup、Requests、Scrappy 等...并不重要,除非它是 python ...
编辑:@Patrick Bormann,请找到屏幕截图...获取数据按钮有效...
当你说它手动工作时,你有没有尝试用动作链而不是内部点击功能来模拟点击
from selenium.webdriver.common.action_chains import ActionChains
easy_apply = Select(driver.find_element_by_id('dateRange'))
actions = ActionChains(driver)
actions.move_to_element(easy_apply)
actions.click(easy_apply)
actions.perform()
然后你模拟鼠标移动到具体值?
此外,我自己试了一下,按下按钮获取数据时没有得到任何数据,因为它似乎有一个 class 像你提到的那样,但是这个按钮不起作用,但如您所见,存在第二个按钮,称为完全下载,也许您会尝试使用这个按钮?因为 GetData 按钮在 Firefox 和 Chrome 上不起作用(当我测试它时)。
您是否已经尝试通过 link 捕获它?
更新
当 OP 在这个紧急问题上寻求帮助时,我提供了一个可行的解决方案。
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.webdriver.support.ui import Select
chrome_driver_path = "../chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
driver.execute_script("document.body.style.zoom='zoom 25%'")
time.sleep(2)
price_volume = driver.find_element_by_xpath('//*[@id="dataType"]/option[2]').click()
time.sleep(2)
date_range = driver.find_element_by_xpath('//*[@id="dateRange"]/option[8]').click()
time.sleep(2)
series = driver.find_element_by_name('series')
time.sleep(2)
drop = Select(series)
drop.select_by_value("EQ")
time.sleep(2)
driver.find_element_by_name("symbol").send_keys("ACC")
ez_download = driver.find_element_by_xpath('//*[@id="wrapper_btm"]/div[1]/div[3]/a')
actions = ActionChains(driver)
actions.move_to_element(ez_download)
actions.click(ez_download)
actions.perform()
给你,对不起,花了一点时间,不得不带我儿子去睡觉......
此解决方案提供此输出:我希望它是正确的。如果你想select其他的下拉菜单你可以改变select中的字符串(字符串因为索引太多太难处理)或者xpath中的数字作为数字高亮索引。时间通常仅适用于需要时间在网页上建立起来的元素。但是我的经验是变化太快有时会导致错误。随意更改时间限制,看看它是否仍然有效。
我希望你现在可以继续为你在印度的生活赚钱。
祝帕特里克一切顺利,
如有任何问题,请随时提出。
更新2
经过一个漫长的夜晚和另一天,我们发现冻结源自该网站,因为该网站使用:
Boomerang | Akamai Developer developer.akamai.com/tools/… Boomerangis
a JavaScript library forReal User Monitoring (commonly called RUM).
Boomerang measures the performance characteristics of real-world page
loads and interactions. The documentation on this page is for mPulse’s
Boomerang. General API documentation for Boomerang can be found
atdocs.soasta.com/boomerang-api/.
.
What I discovered from the html header.
这显然是机器人检测 network/javascript。在这个 SO post 的帮助下:
我终于解决了这个问题:
我们改变了
var_key 在 chromedriver 中到其他类似的东西:
var key = '$dsjfgsdhfdshfsdiojisdjfdsb_';
此外,我将代码更改为:
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
chrome_driver_path = "../chromedriver.exe"
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(executable_path=chrome_driver_path, options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.get('http://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
driver.execute_script("document.body.style.zoom='zoom 25%'")
time.sleep(5)
price_volume = driver.find_element_by_xpath('//*[@id="dataType"]/option[2]').click()
time.sleep(3)
date_range = driver.find_element_by_xpath('//*[@id="dateRange"]/option[8]').click()
time.sleep(5)
series = driver.find_element_by_name('series')
time.sleep(3)
drop = Select(series)
drop.select_by_value("EQ")
time.sleep(4)
driver.find_element_by_name("symbol").send_keys("ACC")
actions = ActionChains(driver)
ez_download = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[2]/div[1]/div[3]/div/div[1]/form/div[2]/div[3]/p/img')
actions.move_to_element(ez_download)
actions.click(ez_download)
actions.perform()
#' essential because the button has to be loaded
time.sleep(5)
driver.find_element_by_class_name('download-data-link').click()
代码终于成功了,OP 很高兴。
我已经使用十六进制编辑器编辑了 chromedriver.exe 并将 cdc_
替换为 dog_
并保存了它。然后使用 chrome 驱动程序执行以下代码。
import selenium
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--disable-blink-features")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
# Open the website
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
symbol_box = driver.find_element_by_id('symbol')
symbol_box.send_keys('20MICRONS')
driver.implicitly_wait(10)
#rd_period=driver.find_element_by_id('rdPeriod')
#rd_period.click()
list_daterange=driver.find_element_by_id('dateRange')
list_daterange=Select(list_daterange)
list_daterange.select_by_value('24month')
driver.implicitly_wait(10)
btn_getdata=driver.find_element_by_xpath('//*[@id="get"]')
btn_getdata.click()
driver.implicitly_wait(100)
print("Clicked button")
lnk_downloadData=driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[2]/div[1]/div[3]/div/div[3]/div[1]/span[2]/a')
lnk_downloadData.click()
这段代码目前运行良好。但问题是——这不是永久的解决办法。 NSE 不断更新逻辑以更好地检测 BOT 执行。与 NSE 一样,我们也将更新我们的代码。如果此代码不起作用,请告诉我。会找出一些其他的解决方案。
我正在尝试从 www1.nseindia.com 下载 24 个月的数据,但在 Chrome 和 Firefox 驱动程序上失败。在所需位置填写所有值后,它只是冻结并且不会单击。网页没有反应...
下面是我要执行的代码:
import time
from selenium import webdriver
from selenium.webdriver.support.ui import Select
id_list = ['ACC', 'ADANIENT']
# Chrome
def EOD_data_Chrome():
driver = webdriver.Chrome(executable_path="C:\Py388\Test\chromedriver.exe")
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
s1= Select(driver.find_element_by_id('dataType'))
s1.select_by_value('priceVolume')
s2= Select(driver.find_element_by_id('series'))
s2.select_by_value('EQ')
s3= Select(driver.find_element_by_id('dateRange'))
s3.select_by_value('24month')
driver.find_element_by_name("symbol").send_keys("ACC")
driver.find_element_by_id("get").click()
time.sleep(9)
s6 = Select(driver.find_element_by_class_name("download-data-link"))
s6.click()
# FireFox(Gecko)
def EOD_data_Gecko():
driver = webdriver.Firefox(executable_path="C:\Py388\Test\geckodriver.exe")
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
s1= Select(driver.find_element_by_id('dataType'))
s1.select_by_value('priceVolume')
s2= Select(driver.find_element_by_id('series'))
s2.select_by_value('EQ')
s3= Select(driver.find_element_by_id('dateRange'))
s3.select_by_value('24month')
driver.find_element_by_name("symbol").send_keys("ACC")
driver.find_element_by_id("get").click()
time.sleep(9)
s6 = Select(driver.find_element_by_class_name("download-data-link"))
s6.click()
EOD_data_Gecko()
# Change the above final line to "EOD_data_Chrome()" and still it just remains stuck...
请帮助解决下载 24 个月数据的代码中缺少的内容...当我在普通浏览器中执行相同操作时,手动点击,成功...
当您在浏览器中手动执行时,您可以更改以下值:
Set first drop down to : Security wise price volume data
"Enter Symbol" : ACC
"Select Series" : EQ
"Period" (radio button: "For Past") : 24 Months
然后单击“获取数据”按钮,大约 3-5 秒后,数据将加载,然后当您单击“以 CSV 格式下载文件”时,您可以在下载中包含 CSV 文件
在 Python 中使用任何你知道的库需要帮助:Selenium、Beautifulsoup、Requests、Scrappy 等...并不重要,除非它是 python ...
编辑:@Patrick Bormann,请找到屏幕截图...获取数据按钮有效...
当你说它手动工作时,你有没有尝试用动作链而不是内部点击功能来模拟点击
from selenium.webdriver.common.action_chains import ActionChains
easy_apply = Select(driver.find_element_by_id('dateRange'))
actions = ActionChains(driver)
actions.move_to_element(easy_apply)
actions.click(easy_apply)
actions.perform()
然后你模拟鼠标移动到具体值?
此外,我自己试了一下,按下按钮获取数据时没有得到任何数据,因为它似乎有一个 class 像你提到的那样,但是这个按钮不起作用,但如您所见,存在第二个按钮,称为完全下载,也许您会尝试使用这个按钮?因为 GetData 按钮在 Firefox 和 Chrome 上不起作用(当我测试它时)。
您是否已经尝试通过 link 捕获它?
更新
当 OP 在这个紧急问题上寻求帮助时,我提供了一个可行的解决方案。
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.webdriver.support.ui import Select
chrome_driver_path = "../chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
driver.execute_script("document.body.style.zoom='zoom 25%'")
time.sleep(2)
price_volume = driver.find_element_by_xpath('//*[@id="dataType"]/option[2]').click()
time.sleep(2)
date_range = driver.find_element_by_xpath('//*[@id="dateRange"]/option[8]').click()
time.sleep(2)
series = driver.find_element_by_name('series')
time.sleep(2)
drop = Select(series)
drop.select_by_value("EQ")
time.sleep(2)
driver.find_element_by_name("symbol").send_keys("ACC")
ez_download = driver.find_element_by_xpath('//*[@id="wrapper_btm"]/div[1]/div[3]/a')
actions = ActionChains(driver)
actions.move_to_element(ez_download)
actions.click(ez_download)
actions.perform()
给你,对不起,花了一点时间,不得不带我儿子去睡觉......
此解决方案提供此输出:
我希望你现在可以继续为你在印度的生活赚钱。 祝帕特里克一切顺利,
如有任何问题,请随时提出。
更新2
经过一个漫长的夜晚和另一天,我们发现冻结源自该网站,因为该网站使用:
Boomerang | Akamai Developer developer.akamai.com/tools/… Boomerangis a JavaScript library forReal User Monitoring (commonly called RUM). Boomerang measures the performance characteristics of real-world page loads and interactions. The documentation on this page is for mPulse’s Boomerang. General API documentation for Boomerang can be found atdocs.soasta.com/boomerang-api/. . What I discovered from the html header.
这显然是机器人检测 network/javascript。在这个 SO post 的帮助下:
我终于解决了这个问题: 我们改变了
var_key 在 chromedriver 中到其他类似的东西:
var key = '$dsjfgsdhfdshfsdiojisdjfdsb_';
此外,我将代码更改为:
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
chrome_driver_path = "../chromedriver.exe"
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(executable_path=chrome_driver_path, options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.get('http://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
driver.execute_script("document.body.style.zoom='zoom 25%'")
time.sleep(5)
price_volume = driver.find_element_by_xpath('//*[@id="dataType"]/option[2]').click()
time.sleep(3)
date_range = driver.find_element_by_xpath('//*[@id="dateRange"]/option[8]').click()
time.sleep(5)
series = driver.find_element_by_name('series')
time.sleep(3)
drop = Select(series)
drop.select_by_value("EQ")
time.sleep(4)
driver.find_element_by_name("symbol").send_keys("ACC")
actions = ActionChains(driver)
ez_download = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[2]/div[1]/div[3]/div/div[1]/form/div[2]/div[3]/p/img')
actions.move_to_element(ez_download)
actions.click(ez_download)
actions.perform()
#' essential because the button has to be loaded
time.sleep(5)
driver.find_element_by_class_name('download-data-link').click()
代码终于成功了,OP 很高兴。
我已经使用十六进制编辑器编辑了 chromedriver.exe 并将 cdc_
替换为 dog_
并保存了它。然后使用 chrome 驱动程序执行以下代码。
import selenium
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--disable-blink-features")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
# Open the website
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
symbol_box = driver.find_element_by_id('symbol')
symbol_box.send_keys('20MICRONS')
driver.implicitly_wait(10)
#rd_period=driver.find_element_by_id('rdPeriod')
#rd_period.click()
list_daterange=driver.find_element_by_id('dateRange')
list_daterange=Select(list_daterange)
list_daterange.select_by_value('24month')
driver.implicitly_wait(10)
btn_getdata=driver.find_element_by_xpath('//*[@id="get"]')
btn_getdata.click()
driver.implicitly_wait(100)
print("Clicked button")
lnk_downloadData=driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[2]/div[1]/div[3]/div/div[3]/div[1]/span[2]/a')
lnk_downloadData.click()
这段代码目前运行良好。但问题是——这不是永久的解决办法。 NSE 不断更新逻辑以更好地检测 BOT 执行。与 NSE 一样,我们也将更新我们的代码。如果此代码不起作用,请告诉我。会找出一些其他的解决方案。