如何使用 Python 请求提交带有不可见 reCAPTCHA 的表单?

How to use Python Requests to submit a form with invisible reCAPTCHA?

我想使用 Python 发送匿名电子邮件。我使用一个名为 emkei.cz 的在线匿名电子邮件发送工具。我想以编程方式使用相同的工具。

如何在网站上填写该表格 (emkei.cz) 并提交以使用 python-requests 发送匿名电子邮件?

我不想使用 seleniummechanize 之类的东西,因为它们很慢(即使我 运行 selenium 没有头绪)并且不需要我可以通过请求模拟的基本 HTML 形式。

我试过的

我在 Microsoft Edge 开发者工具的网络选项卡中填写了表格并检查了提交表格时提出的请求。 我尝试使用 Python.

中的 requests 库来模拟这些请求

邮件发送成功。我还记下了 headers 和有效负载(数据)。 我用相同的表单值编写了一个简单的 Python 脚本,试图发送邮件。 但是,它没有用。

为了调试,我检查了 r.text。它返回的是我刚刚填写的相同输入表单,而不是显示 'Email sent successfully'.

的成功消息

Here 是我使用的代码:

import requests


def send_email(to, subject, body, debug):
    headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9",
        "Cache-Control": "max-age=0",
        "Connection": "keep-alive",
        "Content-Length": "3072",
        "Content-Type": "multipart/form-data; boundary=----WebKitFormBoundaryIGzbcUtI3oNRwVLD",
        "Cookie": "__gads=ID=a33e3b44296022c7-22066d337bd100ce:T=1648910614:RT=1648910614:S=ALNI_MZLGzNvZhCPKcpiV2aS8Nkg4um4SQ",
        "Host": "emkei.cz",
        "Origin": "null",
        "sec-ch-ua": '" Not A;Brand";v="99", "Chromium";v="99", "Microsoft Edge";v="99"',
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": '"Windows"',
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "same-origin",
        "Sec-Fetch-User": "?1",
        "Upgrade-Insecure-Requests": "1",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36 Edg/99.0.1150.55"
    }

    payload = {
        "fromname": "LifeAsAnRPG Team",
        "from": "team@lifeasanrpg.com",
        "rcpt": to,
        "subject": subject,
        "attachment": "(binary)",
        "reply": "",
        "errors": "",
        "cc": "",
        "bcc": "",
        "importance": "normal",
        "xmailer": "0",
        "customxm": "",
        "confirmd": "",
        "confirmr": "",
        "addh": "",
        "smtp": "",
        "smtpp": "",
        "current": "on",
        "charset": "utf-8",
        "mycharset": "",
        "encrypt": "no",
        "ctype": "plain",
        "rte": "0",
        "text": body,
        "g-recaptcha-response": "03AGdBq24yOL4Cas-N8rzxpVSHZnJR0Ec7V_8tylGd_6IpLZotF1hqQZo2Ukyt9qw3CWAqDV7onb2TeJ25cTx9fWPf9icUaK8QCE3HGoxFMO9wYvXB5RNDSkQGbpuU_7mRZl_RDs3RVx6Savi0-PENoz1fvfUBmcKhPbPDXnRWyfayDjS1DrTU0hTivr2Xkp4W3KxBpPBg0lp7W_hgujMxqa5fjXz46Do9ZUq3G2DCRciuwBLYXS3v9nSEW1wqhFtdWfRbby50iougT0DGAWzN5vbs6o0X7YzTit6uyNO2zF0-ZECTH6YNpTMgdlC4t4QquS0-BhXPBOdDCICccYafyGoQgioaPcQt--NfaPFSYvLnVhCjFJ2y2Kl7sFFviGn-lgnvK65NpSKlNjYrSHB29LsLcF1zghmwjPZtWJ7q7rljAhz7rH9Iyxs",
        "ok": "Send"
    }

    request = requests.request(
        method="POST",
        url="https://emkei.cz/",
        headers=headers,
        data=payload
    )

    if debug:
        print(request.text)
        print(request.status_code)

    if request.status_code != 200:
        return -1

    return 0


send_email(
    to="test-test@mailinator.com",
    subject="Test subject.",
    body="""
        Test line one.
        Test line two.
        """,
    debug=True
)

我的猜测是它与验证码和 g-recaptcha-response 负载有关。不过,我在填写表格时没有被要求输入任何验证码。

请尝试访问该网站(Emkei's Anonymous Mailer,上面也有链接)并告诉我如何通过它以编程方式发送电子邮件。

存在不可见的reCAPTCHA,需要渲染页面才能得到g-recaptcha-response
https://developers.google.com/recaptcha/docs/versions#recaptcha_v2_invisible_recaptcha_badge

您可以使用 requests-html,它会在第一次渲染时自动下载 Chromium。
https://pypi.org/project/requests-html/

  1. 渲染页面并在 send_email 函数中设置验证码,在 POST 请求之前:

    from requests_html import HTMLSession
    session = HTMLSession()
    response = session.get("https://emkei.cz/")
    # response.html.render()
    for _ in range(10):
        if response.html.search('name="g-recaptcha-response" value="{}"') is None:
            response.html.render()
    payload['g-recaptcha-response'] = response.html.search('name="g-recaptcha-response" value="{}"')[0]
    
  2. 注释掉"Content-Type": "multipart/form-data; boundary=... header:

    # "Content-Type": "multipart/form-data; boundary=...
    

    您不应该指定自己的边界,因为多部分数据是由 requests 构建的。 https://github.com/psf/requests/issues/1997

  3. request.status_code != 200 旁边添加以下故障检查:

    # if "The invisible reCAPTCHA test wasn't successful. Please, try again." in request.text:
    #     return -1
    if "E-mail sent successfully" not in request.text:
        return -1
    

在 Windows,您可能会在 requests-html 尝试下载 Chromium 时遇到问题。
https://github.com/psf/requests-html/issues/325

不可见的 reCAPTCHA 可能仍会阻止请求

最初,单个 response.html.render() 运行良好。

在 1 小时内 运行 xx 次后,我需要 for _ in range(10) 并且在 render() 期间偶尔会得到 TimeoutError

pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.

1小时内运行xxx次后,https://emkei.cz/大部分returns"The invisible reCAPTCHA test wasn't successful. Please, try again.".