如何使用 requests.Sessions() 将包裹递送到 Python 中的 HTML 中没有 'action' 属性的 URL

How to use requests.Sessions() to deliver a package to a URL with no 'action' attribute in the HTML in Python

我想使用 requests.Sessions() 将我的登录信息传送到网站。登录后,我想导航到第二个 URL,只有登录后才能访问。为了从第二个 URL.

抓取数据

我是新手,对 HTML

没有任何经验

如果有任何不同,我正在合作实验室工作。

这是我的代码和输出:

import requests

page = requests.get("https://app.gristanalytics.com/Account/Login")
page

<Response [200]>

page.status_code

200

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

这是输出:

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
  <link href="/lib/bootstrap/css/bootstrap.min.css" rel="stylesheet"/>
  <link href="/lib/fontawesome/css/all.min.css" rel="stylesheet"/>
  <link href="/lib/datetimepicker/bootstrap-datetimepicker.min.css" rel="stylesheet"/>
  <link href="/lib/vue-multiselect/vue-multiselect.min.css" rel="stylesheet"/>
  <link href="/css/site.css" rel="stylesheet"/>
  <title>
   Log in - Grist
  </title>
 </head>
 <body>
  <div>
   <div class="text-center loginbox">
    <form method="post" style="width:100%;max-width:350px;padding:15px;margin:0 auto;">
     <img alt="" class="mb-4" src="/images/grist_logo_m_black.png"/>
     <h1 class="h3 mb-3 font-weight-normal">
      Please sign in
     </h1>
     <div class="text-danger validation-summary-valid" data-valmsg-summary="true">
      <ul>
       <li style="display:none">
       </li>
      </ul>
     </div>
     <label class="sr-only" for="inputEmail">
      Email address
     </label>
     <input autofocus="" class="form-control my-1" data-val="true" data-val-email="The Email field is not a valid e-mail address." data-val-required="The Email field is required." id="Input_Email" name="Input.Email" placeholder="Email address" required="" type="email" value=""/>
     <label class="sr-only" for="inputPassword">
      Password
     </label>
     <input class="form-control my-1" data-val="true" data-val-required="The Password field is required." id="Input_Password" name="Input.Password" placeholder="Password" required="" type="password"/>
     <div class="checkbox my-3">
      <label>
       <input data-val="true" data-val-required="The Remember me? field is required." id="Input_RememberMe" name="Input.RememberMe" type="checkbox" value="true"/>
       Remember me
      </label>
      <p>
       <a href="/Account/ForgotPassword">
        Forgot your password?
       </a>
      </p>
     </div>
     <button class="btn btn-lg btn-primary btn-block" type="submit">
      Sign in
     </button>
     <p class="mt-5 mb-3 text-muted">
      © 2018-2022
     </p>
     <input name="__RequestVerificationToken" type="hidden" value="CfDJ8CxpSY-tCd5Ou0L0wqhntPACCikaoFBOUQLV0RgCaVUJgt9wRSd3p9aVswNuSLU6OPRKsbIm-qvOyZyZErcEm-E__Q2tPauexh3z_T02Oh5TZCpeY12PsUsERY3INO5LUBBmWXeUR6nG5BFHnnNdW70">
      <input name="Input.RememberMe" type="hidden" value="false"/>
     </input>
    </form>
   </div>
  </div>
  <script src="/lib/jquery-validation/dist/Jquery.validate.min.js">
  </script>
  <script src="/lib/jquery-validation-unobtrusive/jquery.validate.unobtrusive.min.js">
  </script>
 </body>
</html>

在这一点上,我认为我要传递有效负载的字段名称是: name="Input.Email" 和 name="Input.Password"

不过我注意到在HTML代码中没有action属性,所以我打算像你一样将payload发送到原来的URL将在下面看到。

payload = {
    'Input.Email':  'MyEmail', #yes in practice this is my actual information instead of this placeholder
    'Input.Password': 'MyPassword', #same here real password used instead
}
with requests.Session() as session:
  post = session.post('https://app.gristanalytics.com/Account/Login', data=payload)
  r = session.get('https://app.gristanalytics.com/Data/Brewhouse')
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())

这个输出是:

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
  <link href="/lib/bootstrap/css/bootstrap.min.css" rel="stylesheet"/>
  <link href="/lib/fontawesome/css/all.min.css" rel="stylesheet"/>
  <link href="/lib/datetimepicker/bootstrap-datetimepicker.min.css" rel="stylesheet"/>
  <link href="/lib/vue-multiselect/vue-multiselect.min.css" rel="stylesheet"/>
  <link href="/css/site.css" rel="stylesheet"/>
  <title>
   Log in - Grist
  </title>
 </head>
 <body>
  <div>
   <div class="text-center loginbox">
    <form method="post" style="width:100%;max-width:350px;padding:15px;margin:0 auto;">
     <img alt="" class="mb-4" src="/images/grist_logo_m_black.png"/>
     <h1 class="h3 mb-3 font-weight-normal">
      Please sign in
     </h1>
     <div class="text-danger validation-summary-valid" data-valmsg-summary="true">
      <ul>
       <li style="display:none">
       </li>
      </ul>
     </div>
     <label class="sr-only" for="inputEmail">
      Email address
     </label>
     <input autofocus="" class="form-control my-1" data-val="true" data-val-email="The Email field is not a valid e-mail address." data-val-required="The Email field is required." id="Input_Email" name="Input.Email" placeholder="Email address" required="" type="email" value=""/>
     <label class="sr-only" for="inputPassword">
      Password
     </label>
     <input class="form-control my-1" data-val="true" data-val-required="The Password field is required." id="Input_Password" name="Input.Password" placeholder="Password" required="" type="password"/>
     <div class="checkbox my-3">
      <label>
       <input data-val="true" data-val-required="The Remember me? field is required." id="Input_RememberMe" name="Input.RememberMe" type="checkbox" value="true"/>
       Remember me
      </label>
      <p>
       <a href="/Account/ForgotPassword">
        Forgot your password?
       </a>
      </p>
     </div>
     <button class="btn btn-lg btn-primary btn-block" type="submit">
      Sign in
     </button>
     <p class="mt-5 mb-3 text-muted">
      © 2018-2022
     </p>
     <input name="__RequestVerificationToken" type="hidden" value="CfDJ8CxpSY-tCd5Ou0L0wqhntPAwaiYOz80Q50p5gOcDk9qSF-gR4JJpzNGOdSKiQOzcVPp8hBKgDaEwXOrbFnpgdYXkedfcnLQlXIJ1Z7HnIi5vKZybNd6VSKk_Xs5Az444e3Oug-u1UFcxq_OLX1Iu0wU">
      <input name="Input.RememberMe" type="hidden" value="false"/>
     </input>
    </form>
   </div>
  </div>
  <script src="/lib/jquery-validation/dist/Jquery.validate.min.js">
  </script>
  <script src="/lib/jquery-validation-unobtrusive/jquery.validate.unobtrusive.min.js">
  </script>
 </body>
</html>

和第一次一样HTML,明明我没有登录,导致无法进入HTML代码70=]我要。

我尝试了负载字段名称的其他变体,包括:

  1. inputEmail(来自for=)
  2. Input_Email(来自id=)
  3. 电子邮件(来自类型=)

变体 1 的示例代码为

payload = {
    'inputEmail':  'MyEmail', #yes in practice this is my actual information instead of this placeholder
    'inputPassword': 'MyPassword', #same here real password used instead
}

当 运行 这段代码时,我没有收到任何错误或警告消息,所以我有点不知道该怎么做。

以下代码帮助我登录并到达我想去的地方!

非常感谢@bushcat69 提供的帮助,没有他们我可能不会认真看验证令牌。

以及以下 [, 2] 堆栈交换帖子以获取我使用的其他信息。

with requests.Session() as session:
  read = session.get('https://app.gristanalytics.com/Account/Login')
  soup = BeautifulSoup(read.content, 'html.parser')
  token = soup.select_one('[name="__RequestVerificationToken"]').get('value')
  payload = {
    'Input.Email':  'MyEmail@email.com',
    'Input.Password': 'MyPassword',
    '__RequestVerificationToken': token,
    'Input.RememberMe': 'false'
}
  post = session.post('https://app.gristanalytics.com/Account/Login', data=payload)
  r = session.get('https://app.gristanalytics.com/Data/Brewhouse')
  tastySoup = BeautifulSoup(r.content, 'html.parser')
  print(tastySoup.prettify())

我现在遇到了一些问题,我想抓取的某些内容似乎正在通过 Ajax / javascript 工作,但我不知道如何获取。如果您有类似的问题,请查看我以后的问题,如果我找到可以帮助我解决问题的内容,我也会在 stackexchange/whatever 网站上发表评论。