list_rows 在 Kaggle 的 "Intro to SQL" 课程中设置了 max_results 值和 to_dataframe 的问题

Question

我需要一些帮助。在第 1 部分 "Getting Started with SQL and BigQuery" 中，我运行进入以下问题。我已经进入 [7]:

# Preview the first five lines of the "full" table
client.list_rows(table, max_results=5).to_dataframe()

我得到错误：

getting_started_with_bigquery.py:41: UserWarning: Cannot use bqstorage_client if max_results is set, reverting to fetching data with the tabledata.list endpoint.
  client.list_rows(table, max_results=5).to_dataframe()

我在 Notepad++ 中编写我的代码，然后运行通过在 Windows 上的命令提示符中调用它。到目前为止，我已经完成了所有其他工作，但我无法找到解决此问题的方法。 Google 搜索将我引导至 source code for google.cloud.bigquery.table，如果未安装 pandas，该错误应该会出现，所以我安装了它并在我的代码中添加了 import pandas，但我仍然遇到同样的错误。

这是我的完整代码：

from google.cloud import bigquery
import os 
import pandas

#need to set credential path
credential_path = (r"C:\Users\crlas\learningPython\google_application_credentials.json")
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

#create a "Client" object
client = bigquery.Client()

#construct a reference to the "hacker_news" dataset
dataset_ref = client.dataset("hacker_news", project="bigquery-public-data")
#API request - fetch the dataset 
dataset = client.get_dataset(dataset_ref)

#list all tables in the dataset
tables = list(client.list_tables(dataset))
#print all table names
for table in tables:
    print(table.table_id)
print()

#construct a reference to the "full" table
table_ref = dataset_ref.table("full")
#API request - fetch the dataset 
table = client.get_table(table_ref)
#print info on all the columns in the "full" table
print(table.schema)
# print("table schema should have printed above")
print()
#preview first 5 lines of the table
client.list_rows(table, max_results=5).to_dataframe()

Answer 1

如警告消息所述 - 用户警告：如果设置了 max_results，则无法使用 bqstorage_client，恢复为使用 tabledata.list 端点获取数据。

因此，这仍在处理警告并使用 tabledata api 检索数据。您只需要将输出指向数据框对象并打印它，如下所示：

df = client.list_rows(table, max_results=5).to_dataframe()
print(df)

list_rows 在 Kaggle 的 "Intro to SQL" 课程中设置了 max_results 值和 to_dataframe 的问题

problem with list_rows with max_results value set and to_dataframe in Kaggle's "Intro to SQL" course

python

google-bigquery

kaggle