如何使用 Selenium 从 instagram 获取 post url 每次我这样做时,它会在我每次向下滚动时动态变化?
How to get post url from instagram using Selenium as everytime I do it, it dynamically changes everytime i scroll down?
我正在尝试在帐户上抓取 Instagram Post,但每当我告诉它向下滚动时,以前的链接就会消失,新的链接会出现,但从来没有都在同一个位置,现在它总是只捕获 29在 1100 个帖子中。
while(count<10):
for i in range(1,2):
#.execute_script("window.scrollTo(0, document.body.scrollHeight);")
self.browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
print('.', end="",flush=True)
time.sleep(2)
elements = self.browser.find_elements_by_xpath("//div[@class='v1Nh3 kIKUG _bz0w']")
hrefElements = self.browser.find_elements_by_xpath("//div[@class='v1Nh3 kIKUG _bz0w']/a")
elements_link = [x.get_attribute("href") for x in hrefElements]
i = 1
unique = 1
text_file = open("Passed.txt", "r")
lines = text_file.readlines()
text_file.close()
for elements in elements_link:
print(str(i)+'.',end ="",flush=True)
found = self.found(elements,lines)
if found==True:
pass
else:
with open('Passed.txt','a') as f:
f.write(elements+'\n')
unique+=1
i+=1
count+=1
print('-----------------------------------------------')
print('No. of unique Posts Captured : '+ str(unique))
print('-----------------------------------------------')
这是我的代码,用于加载帖子并从帖子中捕获链接并将其保存到另一个文件中,这样我就不必每次都重新运行它。
找到的函数
` def found(self,key,lines):
for i in lines:
if i == key + '\n':
return True
else:
return False
`
我正在尝试捕获 1100 个帖子
这是每次向下滚动时发生的情况
然后向下滚动到
您应该先找到链接,然后向下滚动页面以保存链接,滚动页面并获取滚动页面显示的链接。通过这种方式,您还将保存滚动页面时消失的链接。举个例子:
wait = WebDriverWait(self.browser, 10)
links = []
number_of_posts = 1100
while True:
hrefElements = wait.until(ec.visibility_of_all_elements_located((By.XPATH, "//div[@class='v1Nh3 kIKUG _bz0w']/a")))
elements_link = [x.get_attribute("href") for x in hrefElements]
for link in elements_link:
if link not in links:
links.append(link)
self.browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
self.browser.implicitly_wait(5)
if len(links) >= number_of_posts:
break
links = links[:number_of_posts]
with open('Passed.txt','a') as f:
for link in links:
f.write(elements+'\n')
我正在尝试在帐户上抓取 Instagram Post,但每当我告诉它向下滚动时,以前的链接就会消失,新的链接会出现,但从来没有都在同一个位置,现在它总是只捕获 29在 1100 个帖子中。
while(count<10):
for i in range(1,2):
#.execute_script("window.scrollTo(0, document.body.scrollHeight);")
self.browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
print('.', end="",flush=True)
time.sleep(2)
elements = self.browser.find_elements_by_xpath("//div[@class='v1Nh3 kIKUG _bz0w']")
hrefElements = self.browser.find_elements_by_xpath("//div[@class='v1Nh3 kIKUG _bz0w']/a")
elements_link = [x.get_attribute("href") for x in hrefElements]
i = 1
unique = 1
text_file = open("Passed.txt", "r")
lines = text_file.readlines()
text_file.close()
for elements in elements_link:
print(str(i)+'.',end ="",flush=True)
found = self.found(elements,lines)
if found==True:
pass
else:
with open('Passed.txt','a') as f:
f.write(elements+'\n')
unique+=1
i+=1
count+=1
print('-----------------------------------------------')
print('No. of unique Posts Captured : '+ str(unique))
print('-----------------------------------------------')
这是我的代码,用于加载帖子并从帖子中捕获链接并将其保存到另一个文件中,这样我就不必每次都重新运行它。 找到的函数
` def found(self,key,lines):
for i in lines:
if i == key + '\n':
return True
else:
return False
`
我正在尝试捕获 1100 个帖子
这是每次向下滚动时发生的情况
然后向下滚动到
您应该先找到链接,然后向下滚动页面以保存链接,滚动页面并获取滚动页面显示的链接。通过这种方式,您还将保存滚动页面时消失的链接。举个例子:
wait = WebDriverWait(self.browser, 10)
links = []
number_of_posts = 1100
while True:
hrefElements = wait.until(ec.visibility_of_all_elements_located((By.XPATH, "//div[@class='v1Nh3 kIKUG _bz0w']/a")))
elements_link = [x.get_attribute("href") for x in hrefElements]
for link in elements_link:
if link not in links:
links.append(link)
self.browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
self.browser.implicitly_wait(5)
if len(links) >= number_of_posts:
break
links = links[:number_of_posts]
with open('Passed.txt','a') as f:
for link in links:
f.write(elements+'\n')