抓取 google 网页结果无效
scraping google web results not working
为什么以下内容无法抓取 google 的搜索结果?
尝试打开响应失败并抛出 HTTPError
。我查看了其他问题,据我所知我已经正确完成了编码等。
我知道我没有包括捕获错误等,这只是一个缩小版本。
def scrape_google(query):
url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&"
headers = {'User-Agent': 'Mozilla/5.0'}
search = urllib.parse.urlencode({'q': " ".join(term for term in query)})
b_search = search.encode("utf-8")
response = urllib.request.Request(url, b_search, headers)
page = urllib.request.urlopen(response)
它不起作用,因为 URL 的 return 是 JSON 格式。如果您使用 URL 并输入如下搜索词:
http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=bingo
您将以 JSON 格式返回结果,这不是 beautifulsoup 设置处理的格式。 (但它比刮擦好多了)
{"responseData":
{"results":
[{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.pogo.com/games/bingo-luau","url":"http://www.pogo.com/games/bingo-
//etc
编辑添加:
使用请求:
url = ('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=bingo')
resp = requests.get(url)
print(resp.content)
生成:
b'{"responseData": {"results":[{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.pogo.com/games/b...
//etc
就像人们在评论中所说的那样,requests
库(和(如果需要)结合 beautifulsoup
)是更好的。我回答了关于 抓取 google 搜索结果 here.
的问题
或者,您可以使用 SerpApi 中的 third-party Google Organic Results API。这是付费 API 免费试用。
查看 playground 进行测试。
要集成的代码(假设您要抓取标题、摘要和 link):
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "best lasagna recipe ever",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(f"Title: {result['title']}\nSummary: {result['snippet']}\nLink: {result['link']}")
输出:
Title: The BEST Lasagna Recipe Ever! | The Recipe Critic
Summary: How to Make The BEST Classic Lasagna Ever. Sauté meat then simmer with bases and seasonings: In a large skillet over medium high heat add the olive oil and onion. Cook lasagna noodles: In a large pot, bring the water to a boil. Mix cheeses together: In medium sized bowl add the ricotta cheese, parmesan, and egg.
Link: https://therecipecritic.com/lasagna-recipe/
Title: The Most Amazing Lasagna Recipe - The Stay At Home Chef
Summary: The Most Amazing Lasagna Recipe is the best recipe for homemade Italian-style lasagna. The balance ... This recipe is so good—it makes the kind of lasagna people write home about! ... Hands down absolutely the best lasagna recipe ever!
Link: https://thestayathomechef.com/amazing-lasagna-recipe/
Disclaimer, I work for SerpApi.
为什么以下内容无法抓取 google 的搜索结果?
尝试打开响应失败并抛出 HTTPError
。我查看了其他问题,据我所知我已经正确完成了编码等。
我知道我没有包括捕获错误等,这只是一个缩小版本。
def scrape_google(query):
url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&"
headers = {'User-Agent': 'Mozilla/5.0'}
search = urllib.parse.urlencode({'q': " ".join(term for term in query)})
b_search = search.encode("utf-8")
response = urllib.request.Request(url, b_search, headers)
page = urllib.request.urlopen(response)
它不起作用,因为 URL 的 return 是 JSON 格式。如果您使用 URL 并输入如下搜索词:
http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=bingo
您将以 JSON 格式返回结果,这不是 beautifulsoup 设置处理的格式。 (但它比刮擦好多了)
{"responseData":
{"results":
[{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.pogo.com/games/bingo-luau","url":"http://www.pogo.com/games/bingo-
//etc
编辑添加:
使用请求:
url = ('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=bingo')
resp = requests.get(url)
print(resp.content)
生成:
b'{"responseData": {"results":[{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.pogo.com/games/b...
//etc
就像人们在评论中所说的那样,
的问题requests
库(和(如果需要)结合beautifulsoup
)是更好的。我回答了关于 抓取 google 搜索结果 here.或者,您可以使用 SerpApi 中的 third-party Google Organic Results API。这是付费 API 免费试用。
查看 playground 进行测试。
要集成的代码(假设您要抓取标题、摘要和 link):
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "best lasagna recipe ever",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(f"Title: {result['title']}\nSummary: {result['snippet']}\nLink: {result['link']}")
输出:
Title: The BEST Lasagna Recipe Ever! | The Recipe Critic
Summary: How to Make The BEST Classic Lasagna Ever. Sauté meat then simmer with bases and seasonings: In a large skillet over medium high heat add the olive oil and onion. Cook lasagna noodles: In a large pot, bring the water to a boil. Mix cheeses together: In medium sized bowl add the ricotta cheese, parmesan, and egg.
Link: https://therecipecritic.com/lasagna-recipe/
Title: The Most Amazing Lasagna Recipe - The Stay At Home Chef
Summary: The Most Amazing Lasagna Recipe is the best recipe for homemade Italian-style lasagna. The balance ... This recipe is so good—it makes the kind of lasagna people write home about! ... Hands down absolutely the best lasagna recipe ever!
Link: https://thestayathomechef.com/amazing-lasagna-recipe/
Disclaimer, I work for SerpApi.