如何使用云 运行 python api 从大查询 table 读取大数据,系统配置应该是什么?

How to read large data from big query table using cloud run python api and what should be system config?

我在 python 中创建了一个烧瓶 api 并在我的代码中通过云调度程序在 gcp 云 运行 和 运行 中部署为容器映像我正在从大查询中读取大数据(1500 万行和 20 列),我已将系统配置设置为 8gm ram 4 cpu.

问题 1:读取大约(读取数据 2200 秒)花费的时间过多

import numpy as np
import pandas as pd
from pandas.io import gbq
query = """ SELECT * FROM TABLE_SALES"""
df = gbq.read_gbq(query), project_id="project_name") 

有什么有效的方法可以从BQ读取数据吗?

问题 2:我的代码在读取数据后停止工作。当我检查日志时,我得到了这个:

error - 503
textPayload: "The request failed because either the HTTP response was malformed or connection to the instance had an error.
While handling this request, the container instance was found to be using too much memory and was terminated. This is likely to cause a new container instance to be used for the next request to this revision. If you see this message frequently, you may have a memory leak in your code or may need more memory. Consider creating a new revision with more memory."

解决方法之一是增强系统配置,如果这是解决方案,请告诉我相关成本。

您可以尝试使用 GCP Dataflow 批处理作业从 BQ 读取大量数据。

根据 Bigquery 查询的复杂性,您可能需要考虑高性能 Google Bigquery 存储 API https://cloud.google.com/bigquery/docs/reference/storage/libraries