使用 google-cloud-bigquery 客户端库 (Python) 从 BigQuery 读取时发生 ArrowIOError

ArrowIOError occurred reading from BigQuery with google-cloud-bigquery client library (Python)

我有一个函数可以从对 BigQuery 的查询中检索 pandas 数据框,该函数在过去几个月中运行良好。 今天,在没有任何更改的情况下,它在 GoogleColab Notebooks 中停止工作并抛出此异常:

An exception of type ArrowIOError occurred reading from BigQuery. Arguments: ('Cannot read a negative number of bytes from BufferReader.',)

我的代码:

def read_from_bigquery_client(bq_client, project_id, sql, curr_func):
  try:
    df = bq_client.query(sql, project=project_id).to_dataframe()
    return df
  except Exception as ex:
    template = "An exception of type {0} occurred reading from BigQuery. Arguments:\n{1!r}\nFunction: {2}"
    message = template.format(type(ex).__name__, ex.args, curr_func)
    print(message)
    return None

客户端验证:

credentials = service_account.Credentials.from_service_account_file(local_cred_filename)
bq_client = bigquery.Client(credentials=credentials,
                            project=credentials.project_id)

我尝试过的查询在直接应用于 BigQuery 时效果很好,而且它们之前的效果如上所述。

感谢您的帮助。

new version (1.26.0) of google-cloud-bigquery Python library has been released on 22nd of July. It may occur issue, that haven't been detected yet. The similar issue with corresponding version has been already reported on Github,您可以在其中关注更新。另外,请报告您遇到的错误。

至于现在,ArrowIOError 的解决方法是降级 google-cloud-bigquery 库的版本。

我把google-cloud-bigquery的版本降级到1.24.0,还是报错。 其他版本是:

pyarrow==0.11.1
pandas==0.23.4
pandas-gbq==0.7.0
google-cloud-bigquery==1.24.0

在升级我的 pandas 包之前,我一直面临同样的问题,显然我从文档中看到,高于 0.29.0 的 pandas 版本可以使用 google-cloud-bigquery

更新 pandas 的最佳方法是:

pip3 install --upgrade pandas