如何使用 requests.Sessions() 将包裹递送到 Python 中的 HTML 中没有 'action' 属性的 URL
How to use requests.Sessions() to deliver a package to a URL with no 'action' attribute in the HTML in Python
我想使用 requests.Sessions() 将我的登录信息传送到网站。登录后,我想导航到第二个 URL,只有登录后才能访问。为了从第二个 URL.
抓取数据
我是新手,对 HTML
没有任何经验
如果有任何不同,我正在合作实验室工作。
这是我的代码和输出:
import requests
page = requests.get("https://app.gristanalytics.com/Account/Login")
page
<Response [200]>
page.status_code
200
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
这是输出:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
<link href="/lib/bootstrap/css/bootstrap.min.css" rel="stylesheet"/>
<link href="/lib/fontawesome/css/all.min.css" rel="stylesheet"/>
<link href="/lib/datetimepicker/bootstrap-datetimepicker.min.css" rel="stylesheet"/>
<link href="/lib/vue-multiselect/vue-multiselect.min.css" rel="stylesheet"/>
<link href="/css/site.css" rel="stylesheet"/>
<title>
Log in - Grist
</title>
</head>
<body>
<div>
<div class="text-center loginbox">
<form method="post" style="width:100%;max-width:350px;padding:15px;margin:0 auto;">
<img alt="" class="mb-4" src="/images/grist_logo_m_black.png"/>
<h1 class="h3 mb-3 font-weight-normal">
Please sign in
</h1>
<div class="text-danger validation-summary-valid" data-valmsg-summary="true">
<ul>
<li style="display:none">
</li>
</ul>
</div>
<label class="sr-only" for="inputEmail">
Email address
</label>
<input autofocus="" class="form-control my-1" data-val="true" data-val-email="The Email field is not a valid e-mail address." data-val-required="The Email field is required." id="Input_Email" name="Input.Email" placeholder="Email address" required="" type="email" value=""/>
<label class="sr-only" for="inputPassword">
Password
</label>
<input class="form-control my-1" data-val="true" data-val-required="The Password field is required." id="Input_Password" name="Input.Password" placeholder="Password" required="" type="password"/>
<div class="checkbox my-3">
<label>
<input data-val="true" data-val-required="The Remember me? field is required." id="Input_RememberMe" name="Input.RememberMe" type="checkbox" value="true"/>
Remember me
</label>
<p>
<a href="/Account/ForgotPassword">
Forgot your password?
</a>
</p>
</div>
<button class="btn btn-lg btn-primary btn-block" type="submit">
Sign in
</button>
<p class="mt-5 mb-3 text-muted">
© 2018-2022
</p>
<input name="__RequestVerificationToken" type="hidden" value="CfDJ8CxpSY-tCd5Ou0L0wqhntPACCikaoFBOUQLV0RgCaVUJgt9wRSd3p9aVswNuSLU6OPRKsbIm-qvOyZyZErcEm-E__Q2tPauexh3z_T02Oh5TZCpeY12PsUsERY3INO5LUBBmWXeUR6nG5BFHnnNdW70">
<input name="Input.RememberMe" type="hidden" value="false"/>
</input>
</form>
</div>
</div>
<script src="/lib/jquery-validation/dist/Jquery.validate.min.js">
</script>
<script src="/lib/jquery-validation-unobtrusive/jquery.validate.unobtrusive.min.js">
</script>
</body>
</html>
在这一点上,我认为我要传递有效负载的字段名称是:
name="Input.Email" 和 name="Input.Password"
不过我注意到在HTML代码中没有action属性,所以我打算像你一样将payload发送到原来的URL将在下面看到。
payload = {
'Input.Email': 'MyEmail', #yes in practice this is my actual information instead of this placeholder
'Input.Password': 'MyPassword', #same here real password used instead
}
with requests.Session() as session:
post = session.post('https://app.gristanalytics.com/Account/Login', data=payload)
r = session.get('https://app.gristanalytics.com/Data/Brewhouse')
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())
这个输出是:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
<link href="/lib/bootstrap/css/bootstrap.min.css" rel="stylesheet"/>
<link href="/lib/fontawesome/css/all.min.css" rel="stylesheet"/>
<link href="/lib/datetimepicker/bootstrap-datetimepicker.min.css" rel="stylesheet"/>
<link href="/lib/vue-multiselect/vue-multiselect.min.css" rel="stylesheet"/>
<link href="/css/site.css" rel="stylesheet"/>
<title>
Log in - Grist
</title>
</head>
<body>
<div>
<div class="text-center loginbox">
<form method="post" style="width:100%;max-width:350px;padding:15px;margin:0 auto;">
<img alt="" class="mb-4" src="/images/grist_logo_m_black.png"/>
<h1 class="h3 mb-3 font-weight-normal">
Please sign in
</h1>
<div class="text-danger validation-summary-valid" data-valmsg-summary="true">
<ul>
<li style="display:none">
</li>
</ul>
</div>
<label class="sr-only" for="inputEmail">
Email address
</label>
<input autofocus="" class="form-control my-1" data-val="true" data-val-email="The Email field is not a valid e-mail address." data-val-required="The Email field is required." id="Input_Email" name="Input.Email" placeholder="Email address" required="" type="email" value=""/>
<label class="sr-only" for="inputPassword">
Password
</label>
<input class="form-control my-1" data-val="true" data-val-required="The Password field is required." id="Input_Password" name="Input.Password" placeholder="Password" required="" type="password"/>
<div class="checkbox my-3">
<label>
<input data-val="true" data-val-required="The Remember me? field is required." id="Input_RememberMe" name="Input.RememberMe" type="checkbox" value="true"/>
Remember me
</label>
<p>
<a href="/Account/ForgotPassword">
Forgot your password?
</a>
</p>
</div>
<button class="btn btn-lg btn-primary btn-block" type="submit">
Sign in
</button>
<p class="mt-5 mb-3 text-muted">
© 2018-2022
</p>
<input name="__RequestVerificationToken" type="hidden" value="CfDJ8CxpSY-tCd5Ou0L0wqhntPAwaiYOz80Q50p5gOcDk9qSF-gR4JJpzNGOdSKiQOzcVPp8hBKgDaEwXOrbFnpgdYXkedfcnLQlXIJ1Z7HnIi5vKZybNd6VSKk_Xs5Az444e3Oug-u1UFcxq_OLX1Iu0wU">
<input name="Input.RememberMe" type="hidden" value="false"/>
</input>
</form>
</div>
</div>
<script src="/lib/jquery-validation/dist/Jquery.validate.min.js">
</script>
<script src="/lib/jquery-validation-unobtrusive/jquery.validate.unobtrusive.min.js">
</script>
</body>
</html>
和第一次一样HTML,明明我没有登录,导致无法进入HTML代码70=]我要。
我尝试了负载字段名称的其他变体,包括:
- inputEmail(来自for=)
- Input_Email(来自id=)
- 电子邮件(来自类型=)
变体 1 的示例代码为
payload = {
'inputEmail': 'MyEmail', #yes in practice this is my actual information instead of this placeholder
'inputPassword': 'MyPassword', #same here real password used instead
}
当 运行 这段代码时,我没有收到任何错误或警告消息,所以我有点不知道该怎么做。
以下代码帮助我登录并到达我想去的地方!
非常感谢@bushcat69 提供的帮助,没有他们我可能不会认真看验证令牌。
以及以下 [, 2] 堆栈交换帖子以获取我使用的其他信息。
with requests.Session() as session:
read = session.get('https://app.gristanalytics.com/Account/Login')
soup = BeautifulSoup(read.content, 'html.parser')
token = soup.select_one('[name="__RequestVerificationToken"]').get('value')
payload = {
'Input.Email': 'MyEmail@email.com',
'Input.Password': 'MyPassword',
'__RequestVerificationToken': token,
'Input.RememberMe': 'false'
}
post = session.post('https://app.gristanalytics.com/Account/Login', data=payload)
r = session.get('https://app.gristanalytics.com/Data/Brewhouse')
tastySoup = BeautifulSoup(r.content, 'html.parser')
print(tastySoup.prettify())
我现在遇到了一些问题,我想抓取的某些内容似乎正在通过 Ajax / javascript 工作,但我不知道如何获取。如果您有类似的问题,请查看我以后的问题,如果我找到可以帮助我解决问题的内容,我也会在 stackexchange/whatever 网站上发表评论。
我想使用 requests.Sessions() 将我的登录信息传送到网站。登录后,我想导航到第二个 URL,只有登录后才能访问。为了从第二个 URL.
抓取数据我是新手,对 HTML
没有任何经验如果有任何不同,我正在合作实验室工作。
这是我的代码和输出:
import requests
page = requests.get("https://app.gristanalytics.com/Account/Login")
page
<Response [200]>
page.status_code
200
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
这是输出:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
<link href="/lib/bootstrap/css/bootstrap.min.css" rel="stylesheet"/>
<link href="/lib/fontawesome/css/all.min.css" rel="stylesheet"/>
<link href="/lib/datetimepicker/bootstrap-datetimepicker.min.css" rel="stylesheet"/>
<link href="/lib/vue-multiselect/vue-multiselect.min.css" rel="stylesheet"/>
<link href="/css/site.css" rel="stylesheet"/>
<title>
Log in - Grist
</title>
</head>
<body>
<div>
<div class="text-center loginbox">
<form method="post" style="width:100%;max-width:350px;padding:15px;margin:0 auto;">
<img alt="" class="mb-4" src="/images/grist_logo_m_black.png"/>
<h1 class="h3 mb-3 font-weight-normal">
Please sign in
</h1>
<div class="text-danger validation-summary-valid" data-valmsg-summary="true">
<ul>
<li style="display:none">
</li>
</ul>
</div>
<label class="sr-only" for="inputEmail">
Email address
</label>
<input autofocus="" class="form-control my-1" data-val="true" data-val-email="The Email field is not a valid e-mail address." data-val-required="The Email field is required." id="Input_Email" name="Input.Email" placeholder="Email address" required="" type="email" value=""/>
<label class="sr-only" for="inputPassword">
Password
</label>
<input class="form-control my-1" data-val="true" data-val-required="The Password field is required." id="Input_Password" name="Input.Password" placeholder="Password" required="" type="password"/>
<div class="checkbox my-3">
<label>
<input data-val="true" data-val-required="The Remember me? field is required." id="Input_RememberMe" name="Input.RememberMe" type="checkbox" value="true"/>
Remember me
</label>
<p>
<a href="/Account/ForgotPassword">
Forgot your password?
</a>
</p>
</div>
<button class="btn btn-lg btn-primary btn-block" type="submit">
Sign in
</button>
<p class="mt-5 mb-3 text-muted">
© 2018-2022
</p>
<input name="__RequestVerificationToken" type="hidden" value="CfDJ8CxpSY-tCd5Ou0L0wqhntPACCikaoFBOUQLV0RgCaVUJgt9wRSd3p9aVswNuSLU6OPRKsbIm-qvOyZyZErcEm-E__Q2tPauexh3z_T02Oh5TZCpeY12PsUsERY3INO5LUBBmWXeUR6nG5BFHnnNdW70">
<input name="Input.RememberMe" type="hidden" value="false"/>
</input>
</form>
</div>
</div>
<script src="/lib/jquery-validation/dist/Jquery.validate.min.js">
</script>
<script src="/lib/jquery-validation-unobtrusive/jquery.validate.unobtrusive.min.js">
</script>
</body>
</html>
在这一点上,我认为我要传递有效负载的字段名称是: name="Input.Email" 和 name="Input.Password"
不过我注意到在HTML代码中没有action属性,所以我打算像你一样将payload发送到原来的URL将在下面看到。
payload = {
'Input.Email': 'MyEmail', #yes in practice this is my actual information instead of this placeholder
'Input.Password': 'MyPassword', #same here real password used instead
}
with requests.Session() as session:
post = session.post('https://app.gristanalytics.com/Account/Login', data=payload)
r = session.get('https://app.gristanalytics.com/Data/Brewhouse')
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())
这个输出是:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
<link href="/lib/bootstrap/css/bootstrap.min.css" rel="stylesheet"/>
<link href="/lib/fontawesome/css/all.min.css" rel="stylesheet"/>
<link href="/lib/datetimepicker/bootstrap-datetimepicker.min.css" rel="stylesheet"/>
<link href="/lib/vue-multiselect/vue-multiselect.min.css" rel="stylesheet"/>
<link href="/css/site.css" rel="stylesheet"/>
<title>
Log in - Grist
</title>
</head>
<body>
<div>
<div class="text-center loginbox">
<form method="post" style="width:100%;max-width:350px;padding:15px;margin:0 auto;">
<img alt="" class="mb-4" src="/images/grist_logo_m_black.png"/>
<h1 class="h3 mb-3 font-weight-normal">
Please sign in
</h1>
<div class="text-danger validation-summary-valid" data-valmsg-summary="true">
<ul>
<li style="display:none">
</li>
</ul>
</div>
<label class="sr-only" for="inputEmail">
Email address
</label>
<input autofocus="" class="form-control my-1" data-val="true" data-val-email="The Email field is not a valid e-mail address." data-val-required="The Email field is required." id="Input_Email" name="Input.Email" placeholder="Email address" required="" type="email" value=""/>
<label class="sr-only" for="inputPassword">
Password
</label>
<input class="form-control my-1" data-val="true" data-val-required="The Password field is required." id="Input_Password" name="Input.Password" placeholder="Password" required="" type="password"/>
<div class="checkbox my-3">
<label>
<input data-val="true" data-val-required="The Remember me? field is required." id="Input_RememberMe" name="Input.RememberMe" type="checkbox" value="true"/>
Remember me
</label>
<p>
<a href="/Account/ForgotPassword">
Forgot your password?
</a>
</p>
</div>
<button class="btn btn-lg btn-primary btn-block" type="submit">
Sign in
</button>
<p class="mt-5 mb-3 text-muted">
© 2018-2022
</p>
<input name="__RequestVerificationToken" type="hidden" value="CfDJ8CxpSY-tCd5Ou0L0wqhntPAwaiYOz80Q50p5gOcDk9qSF-gR4JJpzNGOdSKiQOzcVPp8hBKgDaEwXOrbFnpgdYXkedfcnLQlXIJ1Z7HnIi5vKZybNd6VSKk_Xs5Az444e3Oug-u1UFcxq_OLX1Iu0wU">
<input name="Input.RememberMe" type="hidden" value="false"/>
</input>
</form>
</div>
</div>
<script src="/lib/jquery-validation/dist/Jquery.validate.min.js">
</script>
<script src="/lib/jquery-validation-unobtrusive/jquery.validate.unobtrusive.min.js">
</script>
</body>
</html>
和第一次一样HTML,明明我没有登录,导致无法进入HTML代码70=]我要。
我尝试了负载字段名称的其他变体,包括:
- inputEmail(来自for=)
- Input_Email(来自id=)
- 电子邮件(来自类型=)
变体 1 的示例代码为
payload = {
'inputEmail': 'MyEmail', #yes in practice this is my actual information instead of this placeholder
'inputPassword': 'MyPassword', #same here real password used instead
}
当 运行 这段代码时,我没有收到任何错误或警告消息,所以我有点不知道该怎么做。
以下代码帮助我登录并到达我想去的地方!
非常感谢@bushcat69 提供的帮助,没有他们我可能不会认真看验证令牌。
以及以下 [
with requests.Session() as session:
read = session.get('https://app.gristanalytics.com/Account/Login')
soup = BeautifulSoup(read.content, 'html.parser')
token = soup.select_one('[name="__RequestVerificationToken"]').get('value')
payload = {
'Input.Email': 'MyEmail@email.com',
'Input.Password': 'MyPassword',
'__RequestVerificationToken': token,
'Input.RememberMe': 'false'
}
post = session.post('https://app.gristanalytics.com/Account/Login', data=payload)
r = session.get('https://app.gristanalytics.com/Data/Brewhouse')
tastySoup = BeautifulSoup(r.content, 'html.parser')
print(tastySoup.prettify())
我现在遇到了一些问题,我想抓取的某些内容似乎正在通过 Ajax / javascript 工作,但我不知道如何获取。如果您有类似的问题,请查看我以后的问题,如果我找到可以帮助我解决问题的内容,我也会在 stackexchange/whatever 网站上发表评论。