python,无法识别身份验证 - urllib2,请求,asp.net
python, authentication not recognised - urllib2, requests, asp.net
虽然我在这方面并不是特别先进,但我过去在使用 urrlib2、requests 和 scrapy 方面取得过一些成功,但这让我感到难过。因此,经过大量搜索并用头敲击键盘后,我就继续问吧。
我想获取站点的 html 源代码,但在使用我的用户名和密码后,我总是收到一个页面,提示我的用户名和密码错误。它们在浏览器中运行良好,一旦登录,源代码就很容易获得(通过浏览器)。但我似乎无法通过 python/terminal 获得相同的结果。我将在下面包括我的一些尝试(从这些有用的页面中获得):
使用 urllib2:
req = Request(website, headers={ 'User-Agent': 'Mozilla/5.0' })
base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
req.add_header("Authorization", "Basic %s" % base64string)
readweb = urlopen(req).read()
另一个版本:
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
pagehandle = opener.open(theurl)
return pagehandle.read()
并尝试使用请求:
r = requests.session()
try:
r.post(theurl, data={'username' : 'username', 'password' : 'password', 'remember':'1'})
except:
print('Sorry, Unable to...')
result = r.get(theurl)
return result.text
我也尝试过使用 scrapy,但无论我使用哪个库,它都会返回一个页面的 html,它说我的 password/details 是错误的。我猜这与我发送的 headers/authorisation(?) 有关,但我不太确定。非常感谢任何帮助,请让我知道我可以更新哪些其他详细信息(我已经睡了半夜了,所以如果这个 post 没有意义,请原谅我!)
编辑:
下面是对 Prashant 回答的追溯响应(减去密码等):
Traceback (most recent call last):
File "/Users/Hatsaw/newpy/pras.py", line 3, in
r = requests.get(URL, auth=('username','password'))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/api.py", line 67, in get
return request('get', url, params=params, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='website', port=80): Max retries exceeded with url: /dashboard/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
编辑:
好的,我现在正在使用 mechanize(下面推荐),这就是我得到的结果(不确定这是我的根本问题还是我无法使用 mechanize 的另一个实例!):
Traceback (most recent call last):
File "/Users/Hatsaw/newpy/pras2.py", line 13, in
browser.form['email'] = 'email address'
File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 2780, in setitem
File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 3101, in find_control
File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 3185, in _find_control
mechanize._form.ControlNotFoundError: no control matching name 'email'
编辑:
仍在为此苦苦挣扎,所以这是在这个项目时间用完之前的最后努力,我必须进去并手动获取所有 html!手指交叉..
好的,根据 barny 的建议,我重新开始使用请求,并且我正在尝试向 post 提供我从成功的浏览器登录中获得的 cookie 信息。我不确定我这样做是否正确,但我正在使用:
cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0'}
result = sess.get(the_url, cookies=cookies)
现在,我正在收到内部服务器错误响应。经过一些研究,aspnet 表单似乎是问题所在:
- Sending an ASP.net POST with Python's Requests
- Using Python Requests for ASP.NET authentication
我只是想先检查一下我没有对请求做错什么,然后也许我会按照上面 SO link 中 Martijn Pieters 的建议探索 BeautifulSoup/robobrowser。
html 的表单部分要求的是:
<form name="aspnetForm" method="post" action="" id="aspnetForm">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />
<input type="hidden" name="__VIEWSTATEFIELDCOUNT" id="__VIEWSTATEFIELDCOUNT" value="2" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTkwNzg1NTQ3OA9kFgJmD2QWAmYPZBYGAgetc." />
<input type="hidden" name="__VIEWSTATE1" id="__VIEWSTATE1" value="ZyBBIEhvbWUVIE5lZ290aWF0ZSBBZ3JlZW1lbnRzEiBSZetc." />
</div>
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
</script>
<script src="/WebResource.axd?d=t2SAOwDGkbrEfkmUaMOR9sPLXqgxfeenNayRja3DNK2R8JEcH-StTTuiaqXpzp--PAISn3vzVbWQ7biREwPkibCmbAE1&t=635586505120000000" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=EL6tXtJfNfGSoQwhYtVnYEqw4oKvuwBBI4etc." type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
if (typeof(Sys) === 'undefined') throw new Error('ASP.NET Ajax client-side framework failed to load.');
//]]>
</script>
<script src="/ScriptResource.axd?d=qCmNMcECQa0tfmMcZdwJeeOdcyetc." type="text/javascript"></script>
<div>
<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="FC5C7135" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdABB2xJRvPLCcg6GsBqRFCtw6Xg91QEu10etc." />
</div>
所以。一些小问题。
我的 'user/pass' 术语是否必须与源代码匹配,即用户名 = 用户名或用户?:
我现在在 html 中找不到这个,但我找到了 'ctl00$cphMain$tbUsername' 和 'ctl00$cphMain$tbPassword'…
我是否需要将密码 and/or 用户名作为 base64.encodestring 发送?
(我不知道这是否有问题,但密码包含 !@$ 等字符)
我是否需要添加我从浏览器中找到的所有 cookie 字段或仅添加 PHPSESSID?这是我在 cookie 中得到的字段:
ASP.NET_SessionId、CFID、CFTOKEN、__atuvc、__utma、__utmb、__utmc、__utmt、__utmz , BRO_CALLME, BRO_ID, BRO_LOGIN, BRO_MEMBER, BROAUTH, ISFULLMEMBER, phpMBLink, __CT_Data, WRUID
- 有网站 (www.website.com)、登录页面 (www.website.com/login),然后是内容 (www.website.com/content).我认为我使用(成功登录)登录页面中的 cookie 并将其 'send' 用于内容页面是否正确?我应该手动执行此操作(从浏览器 cookie 信息中输入字段详细信息)还是在代码中执行此操作(因此,在下面的代码中我将使用:cookies = r_login.cookies)?
最后,这是我目前正在使用的代码 returns 内部服务器错误..:[=24=]
import requests
the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'
sess = requests.Session()
sess.auth = ('username', 'password')
sess.get(the_url)
payload = {'ctl00$cphMain$tbUsername': username, 'ctl00$cphMain$tbPassword': password}
r_login = sess.post(login, data=payload)
cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0', 'ASP.NET_SessionId':'aspnet', 'BRO_LOGIN':'bro_login'}
r_data = s.get(content, cookies=cookies, data=payload)
print r_data.text
抱歉,这已经变得相当长了,如果我需要将它分成几个 posts 请告诉我 - 我一开始以为是一个简单的问题已经变成了其他问题!
import requests
URL = "http://www.facebook.com'
r = requests.get(URL, auth=('username','password'))
source = r.text
print source
-----更改-----
import mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
cookies = mechanize.CookieJar()
browser.set_cookiejar(cookies)
browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7')]
browser.set_handle_refresh(False)
url = 'http://www.facebook.com/login.php'
self.browser.open(url)
self.browser.select_form(nr = 0) #This is login-password form -> nr = number = 0
self.browser.form['email'] = YourLogin
self.browser.form['pass'] = YourPassw
response = self.browser.submit()
print response.read()
Link
胜利!
好的,感谢 Prashant 和 barny 的回复,并非常感谢 Martijn Pieters post:
Sending an ASP.net POST with Python's Requests
我发现我的救赎是
robobot。
代码如下:
from robobrowser import RoboBrowser
the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'
browser = RoboBrowser(parser='lxml')
browser.open(login)
form = browser.get_forms()
# You can use '.get_form()' for a specific form but I'm finding it easier to
# using '.get_forms()' to get all the forms and then I'm just interested
# in the first one:
form = form[0]
print form # this will give you the information you need to
# now enter your password details:
form['the_user'].value = username
form['the_pass'].value = password
browser.submit_form(form)
# and then because I'm after the html of certain content pages:
browser.open(content)
source = str(browser.parsed)
return source
虽然我在这方面并不是特别先进,但我过去在使用 urrlib2、requests 和 scrapy 方面取得过一些成功,但这让我感到难过。因此,经过大量搜索并用头敲击键盘后,我就继续问吧。
我想获取站点的 html 源代码,但在使用我的用户名和密码后,我总是收到一个页面,提示我的用户名和密码错误。它们在浏览器中运行良好,一旦登录,源代码就很容易获得(通过浏览器)。但我似乎无法通过 python/terminal 获得相同的结果。我将在下面包括我的一些尝试(从这些有用的页面中获得):
使用 urllib2:
req = Request(website, headers={ 'User-Agent': 'Mozilla/5.0' })
base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
req.add_header("Authorization", "Basic %s" % base64string)
readweb = urlopen(req).read()
另一个版本:
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
pagehandle = opener.open(theurl)
return pagehandle.read()
并尝试使用请求:
r = requests.session()
try:
r.post(theurl, data={'username' : 'username', 'password' : 'password', 'remember':'1'})
except:
print('Sorry, Unable to...')
result = r.get(theurl)
return result.text
我也尝试过使用 scrapy,但无论我使用哪个库,它都会返回一个页面的 html,它说我的 password/details 是错误的。我猜这与我发送的 headers/authorisation(?) 有关,但我不太确定。非常感谢任何帮助,请让我知道我可以更新哪些其他详细信息(我已经睡了半夜了,所以如果这个 post 没有意义,请原谅我!)
编辑:
下面是对 Prashant 回答的追溯响应(减去密码等):
Traceback (most recent call last):
File "/Users/Hatsaw/newpy/pras.py", line 3, in r = requests.get(URL, auth=('username','password')) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/api.py", line 67, in get return request('get', url, params=params, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/api.py", line 53, in request return session.request(method=method, url=url, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/sessions.py", line 468, in request resp = self.send(prep, **send_kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/sessions.py", line 576, in send r = adapter.send(request, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/adapters.py", line 437, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='website', port=80): Max retries exceeded with url: /dashboard/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
编辑:
好的,我现在正在使用 mechanize(下面推荐),这就是我得到的结果(不确定这是我的根本问题还是我无法使用 mechanize 的另一个实例!):
Traceback (most recent call last):
File "/Users/Hatsaw/newpy/pras2.py", line 13, in browser.form['email'] = 'email address' File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 2780, in setitem File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 3101, in find_control File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 3185, in _find_control mechanize._form.ControlNotFoundError: no control matching name 'email'
编辑:
仍在为此苦苦挣扎,所以这是在这个项目时间用完之前的最后努力,我必须进去并手动获取所有 html!手指交叉..
好的,根据 barny 的建议,我重新开始使用请求,并且我正在尝试向 post 提供我从成功的浏览器登录中获得的 cookie 信息。我不确定我这样做是否正确,但我正在使用:
cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0'}
result = sess.get(the_url, cookies=cookies)
现在,我正在收到内部服务器错误响应。经过一些研究,aspnet 表单似乎是问题所在:
- Sending an ASP.net POST with Python's Requests
- Using Python Requests for ASP.NET authentication
我只是想先检查一下我没有对请求做错什么,然后也许我会按照上面 SO link 中 Martijn Pieters 的建议探索 BeautifulSoup/robobrowser。
html 的表单部分要求的是:
<form name="aspnetForm" method="post" action="" id="aspnetForm">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />
<input type="hidden" name="__VIEWSTATEFIELDCOUNT" id="__VIEWSTATEFIELDCOUNT" value="2" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTkwNzg1NTQ3OA9kFgJmD2QWAmYPZBYGAgetc." />
<input type="hidden" name="__VIEWSTATE1" id="__VIEWSTATE1" value="ZyBBIEhvbWUVIE5lZ290aWF0ZSBBZ3JlZW1lbnRzEiBSZetc." />
</div>
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
</script>
<script src="/WebResource.axd?d=t2SAOwDGkbrEfkmUaMOR9sPLXqgxfeenNayRja3DNK2R8JEcH-StTTuiaqXpzp--PAISn3vzVbWQ7biREwPkibCmbAE1&t=635586505120000000" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=EL6tXtJfNfGSoQwhYtVnYEqw4oKvuwBBI4etc." type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
if (typeof(Sys) === 'undefined') throw new Error('ASP.NET Ajax client-side framework failed to load.');
//]]>
</script>
<script src="/ScriptResource.axd?d=qCmNMcECQa0tfmMcZdwJeeOdcyetc." type="text/javascript"></script>
<div>
<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="FC5C7135" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdABB2xJRvPLCcg6GsBqRFCtw6Xg91QEu10etc." />
</div>
所以。一些小问题。
我的 'user/pass' 术语是否必须与源代码匹配,即用户名 = 用户名或用户?: 我现在在 html 中找不到这个,但我找到了 'ctl00$cphMain$tbUsername' 和 'ctl00$cphMain$tbPassword'…
我是否需要将密码 and/or 用户名作为 base64.encodestring 发送? (我不知道这是否有问题,但密码包含 !@$ 等字符)
我是否需要添加我从浏览器中找到的所有 cookie 字段或仅添加 PHPSESSID?这是我在 cookie 中得到的字段:
ASP.NET_SessionId、CFID、CFTOKEN、__atuvc、__utma、__utmb、__utmc、__utmt、__utmz , BRO_CALLME, BRO_ID, BRO_LOGIN, BRO_MEMBER, BROAUTH, ISFULLMEMBER, phpMBLink, __CT_Data, WRUID
- 有网站 (www.website.com)、登录页面 (www.website.com/login),然后是内容 (www.website.com/content).我认为我使用(成功登录)登录页面中的 cookie 并将其 'send' 用于内容页面是否正确?我应该手动执行此操作(从浏览器 cookie 信息中输入字段详细信息)还是在代码中执行此操作(因此,在下面的代码中我将使用:cookies = r_login.cookies)?
最后,这是我目前正在使用的代码 returns 内部服务器错误..:[=24=]
import requests
the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'
sess = requests.Session()
sess.auth = ('username', 'password')
sess.get(the_url)
payload = {'ctl00$cphMain$tbUsername': username, 'ctl00$cphMain$tbPassword': password}
r_login = sess.post(login, data=payload)
cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0', 'ASP.NET_SessionId':'aspnet', 'BRO_LOGIN':'bro_login'}
r_data = s.get(content, cookies=cookies, data=payload)
print r_data.text
抱歉,这已经变得相当长了,如果我需要将它分成几个 posts 请告诉我 - 我一开始以为是一个简单的问题已经变成了其他问题!
import requests
URL = "http://www.facebook.com'
r = requests.get(URL, auth=('username','password'))
source = r.text
print source
-----更改-----
import mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
cookies = mechanize.CookieJar()
browser.set_cookiejar(cookies)
browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7')]
browser.set_handle_refresh(False)
url = 'http://www.facebook.com/login.php'
self.browser.open(url)
self.browser.select_form(nr = 0) #This is login-password form -> nr = number = 0
self.browser.form['email'] = YourLogin
self.browser.form['pass'] = YourPassw
response = self.browser.submit()
print response.read()
Link
胜利!
好的,感谢 Prashant 和 barny 的回复,并非常感谢 Martijn Pieters post: Sending an ASP.net POST with Python's Requests
我发现我的救赎是 robobot。
代码如下:
from robobrowser import RoboBrowser
the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'
browser = RoboBrowser(parser='lxml')
browser.open(login)
form = browser.get_forms()
# You can use '.get_form()' for a specific form but I'm finding it easier to
# using '.get_forms()' to get all the forms and then I'm just interested
# in the first one:
form = form[0]
print form # this will give you the information you need to
# now enter your password details:
form['the_user'].value = username
form['the_pass'].value = password
browser.submit_form(form)
# and then because I'm after the html of certain content pages:
browser.open(content)
source = str(browser.parsed)
return source