我将如何使用 BeautifulSoup4 和请求登录 Instagram,我将如何自行决定?
How would I log into Instagram using BeautifulSoup4 and Requests, and how would I determine it on my own?
到目前为止,我已经在 Stack Overflow 上查看了这两个 post:
I can't login to Instagram with Requests and Instagram python requests log in without API。这两种解决方案都不适合我。
我现在该怎么做,其他人将如何找到要在何处提出的请求?为了更清楚地说明这一点,如果我要发送 post 登录请求,我将如何知道发送什么以及发送到哪里?
我不想使用 Instagram 的 API 或 Selenium,因为我想尝试 Requests 和(也许)bs4。
如果您需要一些代码:
import requests
main_url = 'https://www.instagram.com/'
login_url = main_url+'accounts/login/ajax'
user_agent = 'User-Agent: Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523 Safari/8536.25'
session = requests.session()
session.headers = {"user-agent": user_agent}
session.headers.update({'Referer': main_url})
req = session.get(main_url)
session.headers.update({'set-cookie': req.cookies['csrftoken']})
print(req.status_code)
login_data = {"csrfmiddlewaretoken": req.cookies['csrftoken'], "username": "myusername", "password": "mypassword"}
login = session.post(login_url, data=login_data, allow_redirects=True)
print(login.status_code)
session.headers.update({'set-cookie': login.cookies['csrftoken']})
cookies = login.cookies
print(login.headers)
print(login.status_code)
这给了我一个 405 错误。
您可以使用此代码登录 instagram
import re
import requests
from bs4 import BeautifulSoup
from datetime import datetime
link = 'https://www.instagram.com/accounts/login/'
login_url = 'https://www.instagram.com/accounts/login/ajax/'
time = int(datetime.now().timestamp())
payload = {
'username': 'login',
'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{time}:your_password',
'queryParams': {},
'optIntoOneTap': 'false'
}
with requests.Session() as s:
r = s.get(link)
csrf = re.findall(r"csrf_token\":\"(.*?)\"", r.text)[0]
r = s.post(login_url, data=payload, headers={
"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
"Referer": "https://www.instagram.com/accounts/login/",
"x-csrftoken": csrf
})
print(r.status_code)
提示:我需要修改行
r = s.get(link)
进入
r = s.get(link,headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})
以获得正确的答复。没有它,我使用 JupyterNotebook 时出现“找不到页面”。
到目前为止,我已经在 Stack Overflow 上查看了这两个 post: I can't login to Instagram with Requests and Instagram python requests log in without API。这两种解决方案都不适合我。
我现在该怎么做,其他人将如何找到要在何处提出的请求?为了更清楚地说明这一点,如果我要发送 post 登录请求,我将如何知道发送什么以及发送到哪里?
我不想使用 Instagram 的 API 或 Selenium,因为我想尝试 Requests 和(也许)bs4。
如果您需要一些代码:
import requests
main_url = 'https://www.instagram.com/'
login_url = main_url+'accounts/login/ajax'
user_agent = 'User-Agent: Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523 Safari/8536.25'
session = requests.session()
session.headers = {"user-agent": user_agent}
session.headers.update({'Referer': main_url})
req = session.get(main_url)
session.headers.update({'set-cookie': req.cookies['csrftoken']})
print(req.status_code)
login_data = {"csrfmiddlewaretoken": req.cookies['csrftoken'], "username": "myusername", "password": "mypassword"}
login = session.post(login_url, data=login_data, allow_redirects=True)
print(login.status_code)
session.headers.update({'set-cookie': login.cookies['csrftoken']})
cookies = login.cookies
print(login.headers)
print(login.status_code)
这给了我一个 405 错误。
您可以使用此代码登录 instagram
import re
import requests
from bs4 import BeautifulSoup
from datetime import datetime
link = 'https://www.instagram.com/accounts/login/'
login_url = 'https://www.instagram.com/accounts/login/ajax/'
time = int(datetime.now().timestamp())
payload = {
'username': 'login',
'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{time}:your_password',
'queryParams': {},
'optIntoOneTap': 'false'
}
with requests.Session() as s:
r = s.get(link)
csrf = re.findall(r"csrf_token\":\"(.*?)\"", r.text)[0]
r = s.post(login_url, data=payload, headers={
"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
"Referer": "https://www.instagram.com/accounts/login/",
"x-csrftoken": csrf
})
print(r.status_code)
提示:我需要修改行
r = s.get(link)
进入
r = s.get(link,headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})
以获得正确的答复。没有它,我使用 JupyterNotebook 时出现“找不到页面”。