无法抓取 Instagram 个人资料
Can't scrape Instagram profile
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.instagram.com/marcelo.codes/")
soup = BeautifulSoup(page.content, "html.parser")
profileName = soup.find('h2', class_="_7UhW9 fKFbl yUEEX KV-D4 fDxYl ")
followers = soup.find('span', class_="g47SY")
bio = soup.find('div', class_="-vDIg")
postsAmount = soup.find('span', class_ ="g47SY lOXF2")
print(f"""
Name: {profileName}
followers: {followers}
bio: {bio}
posts: {postsAmount}
""")
这是我的代码,每次我运行这个结果是:
python3 er.py
Name: None
followers: None
bio: None
posts: None
为了得到我想要的结果我应该改变什么?
页面正在将数据存储在页面内的 Javascript 变量中。您可以使用此脚本从中获取日期:
import re
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
}
url = "https://www.instagram.com/marcelo.codes/"
data = json.loads(
re.search(
r"<script type=\"text/javascript\">window\._sharedData = (.*});",
requests.get(url, headers=headers).text,
).group(1)
)
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
print("Bio:")
print(data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["biography"])
print("\nFollowed:")
print(
data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_followed_by"][
"count"
]
)
print("\nFollowers:")
print(
data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_follow"][
"count"
]
)
打印:
Bio:
✏ Trying my best and showing my journey into coding.
Brazilian.
Learning Python right now!
Taking doubts, and showing my progress.
Links.
Followed:
102
Followers:
10
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.instagram.com/marcelo.codes/")
soup = BeautifulSoup(page.content, "html.parser")
profileName = soup.find('h2', class_="_7UhW9 fKFbl yUEEX KV-D4 fDxYl ")
followers = soup.find('span', class_="g47SY")
bio = soup.find('div', class_="-vDIg")
postsAmount = soup.find('span', class_ ="g47SY lOXF2")
print(f"""
Name: {profileName}
followers: {followers}
bio: {bio}
posts: {postsAmount}
""")
这是我的代码,每次我运行这个结果是:
python3 er.py
Name: None
followers: None
bio: None
posts: None
为了得到我想要的结果我应该改变什么?
页面正在将数据存储在页面内的 Javascript 变量中。您可以使用此脚本从中获取日期:
import re
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
}
url = "https://www.instagram.com/marcelo.codes/"
data = json.loads(
re.search(
r"<script type=\"text/javascript\">window\._sharedData = (.*});",
requests.get(url, headers=headers).text,
).group(1)
)
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
print("Bio:")
print(data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["biography"])
print("\nFollowed:")
print(
data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_followed_by"][
"count"
]
)
print("\nFollowers:")
print(
data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_follow"][
"count"
]
)
打印:
Bio:
✏ Trying my best and showing my journey into coding.
Brazilian.
Learning Python right now!
Taking doubts, and showing my progress.
Links.
Followed:
102
Followers:
10