Python3 BigQuery 或 Google Cloud Python 通过 HTTP 代理

Python3 BigQuery or Google Cloud Python through HTTP Proxy

如何通过 HTTP 代理 路由 BigQuery client 调用?

在发布之前,我尝试了以下操作,但它仍然没有通过 http 代理进行路由。并且 Google 云服务凭证是通过 shell 环境变量 GOOGLE_APPLICATION_CREDENTIALS

设置的
import httplib2
import socks
import google.auth

credentials, _ = google.auth.default()
http_client = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, 'someproxy', 80));

bigquery_client = bigquery.Client(credentials=credentials, _http=http_client)

传出流量(172.217.x.x 属于 googleapis.com)未通过 HTTP 代理路由,

$ netstat -nputw
Local Address           Foreign Address
x.x.x.x                 172.217.6.234:443       SYN_SENT

我发现创建这些凭据的唯一方法是直接在我的 os 环境中设置它。

假设您已经 json credential file 了,也许这对您有用:

import httplib2
import socks
import os
import google.auth

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/your/credentials_file.json'

credentials, _ = google.auth.default()
http_client = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, 'someproxy', 80))
bigquery_client = bigquery.Client(_http=http_client, credentials=credentials)

当我找到 reason/solution.

时,我自己回答了这个问题

原因:

google-cloud-python 库使用 httplib2,在撰写本文时 httplib2 有 两个代码库 用于 python 2 和 python 3。httplib2 的 Python 3 版本未实现 socks/proxy 支持。请参考httplib2's repo#init_py.

解决方法:

有个discussion to move google-cloud-python from httplib2 to urllib3, but in the mean time one can use httplib2shim

import google.auth
import httplib2shim
import google_auth_httplib2

// More declarative way exists, but left for simplicity
os.environ["HTTP_PROXY"] = "someproxy:80"
os.environ["HTTPS_PROXY"] = "someproxy:80"
http_client = httplib2shim.Http()
credentials, _ = google.auth.default()

# IMO, Following 2 lines should be done at the google-cloud-python
# This exposes client speicific logic, and it already does that
credentials = google.auth.credentials.with_scopes_if_required
              (credentials, bigquery.Client.SCOPE)
authed_http = google_auth_httplib2.AuthorizedHttp(credentials,http_client)

bigquery_client = bigquery.Client(credentials=credentials, _http=authed_http)

只需设置以下环境变量即可。但我使用的是 Python 2.7,安装了 shadowsocks 代理客户端并监听 1087 端口。

os.environ["http_proxy"] = "http://127.0.0.1:1087"
os.environ["https_proxy"] = "http://127.0.0.1:1087"

使用最新的 API (google-auth 1.5.0) 更新

import os
import google.auth
from google.cloud import bigquery

os.environ["HTTP_PROXY"] = "someproxy:80"
os.environ["HTTPS_PROXY"] = "someproxy:80"

credentials, _ = google.auth.default()
credentials = google.auth.credentials.with_scopes_if_required(
                  credentials, bigquery.Client.SCOPE)
authed_http = google.auth.transport.requests.AuthorizedSession(credentials)

bigquery_client = bigquery.Client(credentials=credentials, _http=authed_http)