如何抓取 Google People Also Ask with Selenium 和 Python 的问题和答案,以获得超过 Google 默认输出的数量?
How to crawl question and answer of Google People Also Ask with Selenium and Python for a quantity that is more than the default output of Google?
我找到了一个很好的 ,但它适用于 Google 默认给出的问题和答案的数量,但例如我需要更多。
我是 Python 的开发新手。
如何获得更多问题和答案?
我是否必须先实现点击以显示所需金额然后再解析?
以下代码解析屏幕上出现的问题,然后询问您是否要解析更多问题。如果您输入 y
然后它会点击最后一个问题的按钮,以便在页面中加载更多问题。问题存储在列表 questions
中,答案存储在列表 answers
中
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
your_path = '...'
driver = webdriver.Chrome(service=Service(your_path))
driver.get('https://www.google.com/search?q=How%20to%20make%20bakery%3F&source=hp&ei=j0aZYYjRAvja2roPrcWcyAU&iflsig=ALs-wAMAAAAAYZlUn4NMUPjfIpQmrXSmjIDnaWjJXWIJ&ved=0ahUKEwjI1JDn0Kf0AhV4rVYBHa0iB1kQ4dUDCAc&uact=5&oq=How%20to%20make%20bakery%3F&gs_lcp=Cgdnd3Mtd2l6EAMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBNQAFgAYJMDaABwAHgAgAF-iAF-kgEDMC4xmAEAoAECoAEB&sclient=gws-wiz')
questions, answers = [], []
while 1:
for idx,question in enumerate(driver.find_elements(By.CSS_SELECTOR, "div[id*='RELATED_QUESTION']")):
if idx >= len(questions): # skip already parsed questions
questions.append(question.text)
txt = ''
for answer in question.find_elements(By.CSS_SELECTOR, "div[id*='WEB_ANSWERS_RESULT']"):
txt += answer.get_attribute('innerText')
answers.append(txt)
inp = input(f'{idx+1} questions parsed, continue? (y/n)')
if inp == 'y':
question.click()
time.sleep(2)
else:
break
我找到了一个很好的
我是 Python 的开发新手。 如何获得更多问题和答案? 我是否必须先实现点击以显示所需金额然后再解析?
以下代码解析屏幕上出现的问题,然后询问您是否要解析更多问题。如果您输入 y
然后它会点击最后一个问题的按钮,以便在页面中加载更多问题。问题存储在列表 questions
中,答案存储在列表 answers
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
your_path = '...'
driver = webdriver.Chrome(service=Service(your_path))
driver.get('https://www.google.com/search?q=How%20to%20make%20bakery%3F&source=hp&ei=j0aZYYjRAvja2roPrcWcyAU&iflsig=ALs-wAMAAAAAYZlUn4NMUPjfIpQmrXSmjIDnaWjJXWIJ&ved=0ahUKEwjI1JDn0Kf0AhV4rVYBHa0iB1kQ4dUDCAc&uact=5&oq=How%20to%20make%20bakery%3F&gs_lcp=Cgdnd3Mtd2l6EAMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBNQAFgAYJMDaABwAHgAgAF-iAF-kgEDMC4xmAEAoAECoAEB&sclient=gws-wiz')
questions, answers = [], []
while 1:
for idx,question in enumerate(driver.find_elements(By.CSS_SELECTOR, "div[id*='RELATED_QUESTION']")):
if idx >= len(questions): # skip already parsed questions
questions.append(question.text)
txt = ''
for answer in question.find_elements(By.CSS_SELECTOR, "div[id*='WEB_ANSWERS_RESULT']"):
txt += answer.get_attribute('innerText')
answers.append(txt)
inp = input(f'{idx+1} questions parsed, continue? (y/n)')
if inp == 'y':
question.click()
time.sleep(2)
else:
break