在 POST 之前阅读页面源代码
Read page source before POST
我想知道有没有办法在阅读页面源码后POST参数。例如:在发布 ID#
之前阅读验证码
我当前的代码:
import requests
id_number = "1"
url = "http://www.submitmyforum.com/page.php"
data = dict(id = id_number, name = 'Alex')
post = requests.post(url, data=data)
每次请求 http://submitforum.com/page.php 后都有一个验证码可以更改(obv 不是真实站点)我想读取该参数并将其提交给 "data" 变量。
正如OP评论中所讨论的,可以使用selenium,也可以存在没有浏览器模拟的方法!
使用 Selenium (http://selenium-python.readthedocs.io/) 而不是请求模块方法:
import re
import selenium
from selenium import webdriver
regexCaptcha = "k=.*&co="
url = "http://submitforum.com/page.php"
# Get to the URL
browser = webdriver.Chrome()
browser.get(url)
# Example for getting page elements (using css seletors)
# In this example, I'm getting the google recaptcha ID if present on the current page
try:
element = browser.find_element_by_css_selector('iframe[src*="https://www.google.com/recaptcha/api2/anchor?k"]')
captchaID = re.findall(regexCaptcha, element.get_attribute("src"))[0].replace("k=", "").replace("&co=", "")
captchaFound = True
print "Captcha found !", captchaID
except Exception, ex:
print "No captcha found !"
captchaFound = False
# Treat captcha
# --> Your treatment code
# Enter Captcha Response on page
captchResponse = browser.find_element_by_id('captcha-response')
captchResponse.send_keys(captcha_answer)
# Validate the form
validateButton = browser.find_element_by_id('submitButton')
validateButton.click()
# --> Analysis of returned page if needed
我想知道有没有办法在阅读页面源码后POST参数。例如:在发布 ID#
之前阅读验证码我当前的代码:
import requests
id_number = "1"
url = "http://www.submitmyforum.com/page.php"
data = dict(id = id_number, name = 'Alex')
post = requests.post(url, data=data)
每次请求 http://submitforum.com/page.php 后都有一个验证码可以更改(obv 不是真实站点)我想读取该参数并将其提交给 "data" 变量。
正如OP评论中所讨论的,可以使用selenium,也可以存在没有浏览器模拟的方法!
使用 Selenium (http://selenium-python.readthedocs.io/) 而不是请求模块方法:
import re
import selenium
from selenium import webdriver
regexCaptcha = "k=.*&co="
url = "http://submitforum.com/page.php"
# Get to the URL
browser = webdriver.Chrome()
browser.get(url)
# Example for getting page elements (using css seletors)
# In this example, I'm getting the google recaptcha ID if present on the current page
try:
element = browser.find_element_by_css_selector('iframe[src*="https://www.google.com/recaptcha/api2/anchor?k"]')
captchaID = re.findall(regexCaptcha, element.get_attribute("src"))[0].replace("k=", "").replace("&co=", "")
captchaFound = True
print "Captcha found !", captchaID
except Exception, ex:
print "No captcha found !"
captchaFound = False
# Treat captcha
# --> Your treatment code
# Enter Captcha Response on page
captchResponse = browser.find_element_by_id('captcha-response')
captchResponse.send_keys(captcha_answer)
# Validate the form
validateButton = browser.find_element_by_id('submitButton')
validateButton.click()
# --> Analysis of returned page if needed