如何在 Instagram 上获得 post 描述?
How do I get a post description on instagram?
我正在尝试获取 Instagram 上每张图片的 post 描述,但我只获得了描述的一小部分。有人可以帮我获取整张图片 post 的描述吗?
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
# ---------------- getting hrefs in posts ------------------ #
# Step 1
driver = webdriver.Chrome('/Users/jjcauton/Documents/python/chromedriver')
driver.get('https://www.instagram.com/addict_for_sneakers/')
hrefs = driver.find_elements_by_tag_name('a')
print(hrefs)
hrefs_elem = [elem.get_attribute('href') for elem in hrefs]
hrefs_elem = [href for href in hrefs_elem if '/p/' in href]
print(hrefs_elem)
for href in hrefs_elem:
driver.get(href)
page = requests.get(href)
soup = BeautifulSoup(page.content, 'lxml')
page_contents = soup.title
contents = page_contents.get_text()
print(contents)
结果是这样的:
Boricua Adicto A Tenis on Instagram: “ Giveaway Win a FREE pair of Adidas Yeezy 350 v2 "Yeshaya" (Winner Picks His or Her Size) by following the simple steps below. Here’s…”
Boricua Adicto A Tenis on Instagram: “1,2,3,4,5,6,7,8,9 or 10?
#Tecatodetenis”
Boricua Adicto A Tenis on Instagram: “The the future of sneakers trading is here Make money by buying shares, then selling them for more than what you paid Start with only…”
Boricua Adicto A Tenis on Instagram: “What’s your favorite AJ11?”
Boricua Adicto A Tenis on Instagram: “ Giveaway Win a FREE pair of Retro 1 Fearless by following the simple steps below. Here’s how you can win: 1️⃣ Follow:…”
Boricua Adicto A Tenis on Instagram: “1,2,3,4,5,6,7,8,9 or 10?
#Tecatodetenis”
Boricua Adicto A Tenis on Instagram: “Choose One!”
Boricua Adicto A Tenis on Instagram: “FREEGIVEAWAY Win the ️red 1️⃣1️⃣ for FREE by following these steps: Step 1️⃣. Follow them: @_jsole_ @wallkicksofficial @pr_sneaks23…”
Boricua Adicto A Tenis on Instagram: “What’s your favorite retro 4?”
Boricua Adicto A Tenis on Instagram: “ Giveaway Win a FREE pair of Retro 1 Turbo Green by following the simple steps below. Here’s how you can win: 1️⃣ Follow:…”
Boricua Adicto A Tenis on Instagram: “1,2,3,4,5,6,7,8,9 or 10?
#Tecatodetenis”
Boricua Adicto A Tenis on Instagram: “✨LAST CHANCE✨ ☁️CHOOSE YOUR FAVORITE SHOE☁️ ⠀ To Enter Simply: 1️⃣: Like This Picture 2️⃣: Follow @Luisanglcordova @Hypedseason…”
如您所见,它只给出了图片 post 描述的一小部分。我需要完整的描述。谢谢!
您正在查看错误的标签。 Instagram 只有 <script>
标签内的帖子全文,因此返回所有 <a>
标签对您没有帮助。您需要找到包含 'edge_media_to_caption' 的 <script>
标签。脚本标签很长,但其中包含以下内容(取自 Instagram 帐户 /katyperry/):
"edge_media_to_caption": {
"edges": [{
"node": {
"text": "Many people wonder how the pyramids were actually built... but me, I am in constant awe and wonder of how such a loving/kind/compassionate/supportive/talented/deeply spiritual/did I mention incredibly good looking/James Bond of a human being can actually exist in the flesh!\n\nThere\u2019s a reason why all animals and children run straight into his arms... It\u2019s his heart, so pure. I love you Orlando Jonathan Blanchard Copeland Bloom. Happiest 43rd year. \u2665\ufe0f\ud83c\udf82\u2660\ufe0f"
}
}]
},
使用它,您可以使用字符串 [index1:index2] 提取数据,其中可以使用 string.find("some value")
找到索引
我正在尝试获取 Instagram 上每张图片的 post 描述,但我只获得了描述的一小部分。有人可以帮我获取整张图片 post 的描述吗?
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
# ---------------- getting hrefs in posts ------------------ #
# Step 1
driver = webdriver.Chrome('/Users/jjcauton/Documents/python/chromedriver')
driver.get('https://www.instagram.com/addict_for_sneakers/')
hrefs = driver.find_elements_by_tag_name('a')
print(hrefs)
hrefs_elem = [elem.get_attribute('href') for elem in hrefs]
hrefs_elem = [href for href in hrefs_elem if '/p/' in href]
print(hrefs_elem)
for href in hrefs_elem:
driver.get(href)
page = requests.get(href)
soup = BeautifulSoup(page.content, 'lxml')
page_contents = soup.title
contents = page_contents.get_text()
print(contents)
结果是这样的:
Boricua Adicto A Tenis on Instagram: “ Giveaway Win a FREE pair of Adidas Yeezy 350 v2 "Yeshaya" (Winner Picks His or Her Size) by following the simple steps below. Here’s…”
Boricua Adicto A Tenis on Instagram: “1,2,3,4,5,6,7,8,9 or 10?
#Tecatodetenis”
Boricua Adicto A Tenis on Instagram: “The the future of sneakers trading is here Make money by buying shares, then selling them for more than what you paid Start with only…”
Boricua Adicto A Tenis on Instagram: “What’s your favorite AJ11?”
Boricua Adicto A Tenis on Instagram: “ Giveaway Win a FREE pair of Retro 1 Fearless by following the simple steps below. Here’s how you can win: 1️⃣ Follow:…”
Boricua Adicto A Tenis on Instagram: “1,2,3,4,5,6,7,8,9 or 10?
#Tecatodetenis”
Boricua Adicto A Tenis on Instagram: “Choose One!”
Boricua Adicto A Tenis on Instagram: “FREEGIVEAWAY Win the ️red 1️⃣1️⃣ for FREE by following these steps: Step 1️⃣. Follow them: @_jsole_ @wallkicksofficial @pr_sneaks23…”
Boricua Adicto A Tenis on Instagram: “What’s your favorite retro 4?”
Boricua Adicto A Tenis on Instagram: “ Giveaway Win a FREE pair of Retro 1 Turbo Green by following the simple steps below. Here’s how you can win: 1️⃣ Follow:…”
Boricua Adicto A Tenis on Instagram: “1,2,3,4,5,6,7,8,9 or 10?
#Tecatodetenis”
Boricua Adicto A Tenis on Instagram: “✨LAST CHANCE✨ ☁️CHOOSE YOUR FAVORITE SHOE☁️ ⠀ To Enter Simply: 1️⃣: Like This Picture 2️⃣: Follow @Luisanglcordova @Hypedseason…”
如您所见,它只给出了图片 post 描述的一小部分。我需要完整的描述。谢谢!
您正在查看错误的标签。 Instagram 只有 <script>
标签内的帖子全文,因此返回所有 <a>
标签对您没有帮助。您需要找到包含 'edge_media_to_caption' 的 <script>
标签。脚本标签很长,但其中包含以下内容(取自 Instagram 帐户 /katyperry/):
"edge_media_to_caption": {
"edges": [{
"node": {
"text": "Many people wonder how the pyramids were actually built... but me, I am in constant awe and wonder of how such a loving/kind/compassionate/supportive/talented/deeply spiritual/did I mention incredibly good looking/James Bond of a human being can actually exist in the flesh!\n\nThere\u2019s a reason why all animals and children run straight into his arms... It\u2019s his heart, so pure. I love you Orlando Jonathan Blanchard Copeland Bloom. Happiest 43rd year. \u2665\ufe0f\ud83c\udf82\u2660\ufe0f"
}
}]
},
使用它,您可以使用字符串 [index1:index2] 提取数据,其中可以使用 string.find("some value")
找到索引