在 Python GAE 中使用 URLFetch 获取完整文档

Question

我在使用 Python 2.7.

的 App 引擎中使用 urlfetch.fetch

我尝试获取属于 2 个不同域的 2 个 URL。对于第一个，urlfetch.fetch 的结果包括解析为获得推荐产品而进行的 XHR 查询后的结果。但是，对于属于另一个域的另一个页面，XHR 查询没有得到解决，大部分情况下我只得到普通的 HTML。此页面的 XHR 查询也用于显示推荐产品等目的。

下面是我如何使用 urlfetch： fetch_result = urlfetch.fetch(url, 截止日期=5, validate_certificate=真)

有人可以告诉我关于不一致的地方我可能遗漏了什么吗？

Answer 1

服务器根据请求 headers 中提供的 user-agent 字符串提供不同的输出。

默认情况下，urlfetch.fetch 将发送请求，用户代理 header 设置为 AppEngine-Google; (+http://code.google.com/appengine; appid: myapp.appspot.com。

浏览器会像这样发送用户代理 header：Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0

如果您覆盖 urlfetch.fetch

的默认 headers

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}
urlfetch.fetch(url, headers=headers)

您会发现您收到的 html 与提供给浏览器的几乎相同。

Using URLFetch in Python GAE to fetch a complete document