在 jupyter notebook 中读取顶点 ai 数据集

Question

我正在尝试创建一个 python 实用程序，它将从顶点 ai 数据集中获取数据集并为该数据集生成统计信息。但我无法使用 jupyter notebook 检查数据集。有什么办法吗？

Answer 1

如果我没理解错的话，你想在 Jupyter Notebook 中使用 Vertex AI 数据集。我不认为这是目前可能的。您可以将 Vertex AI 个数据集以 JSONL 格式导出到 Google Cloud Storage：

Your dataset will be exported as a list of text items in JSONL format. Each row contains a Cloud Storage path, any label(s) assigned to that item, and a flag that indicates whether that item is in the training, validation, or test set.

此时，您可以使用 %%bigquery 在 Notebook 中使用 BigQuery 数据，就像在 Visualizing BigQuery data in a Jupyter notebook. or use csv_read() from machine directory or GCS like it's showed in the How to read csv file in Google Cloud Platform jupyter notebook 线程中提到的那样。

但是，您可以在 Google Issue Tracker 中填写一个 Feature Request 以添加直接在 Jupyter Notebook 中使用 VertexAI 数据集的可能性，这将被 Google Vertex AI Team.

Answer 2

如果我错了请纠正我，你是想访问你 gcp 项目中的顶点 ai 数据集到 jupyter notebook 中吗？如果是这样，请尝试以下代码，看看是否可以访问数据集。

def list_datasets(project_id, compute_region, filter=None):
"""List all datasets."""
result = []
# [START automl_tables_list_datasets]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# filter = 'filter expression here'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)
print('client:',client)
# List all the datasets available in the region by applying filter.
response = client.list_datasets(filter=filter)

print("List of datasets:")
for dataset in response:
    # Display the dataset information.
    print("Dataset name: {}".format(dataset.name))
    print("Dataset id: {}".format(dataset.name.split("/")[-1]))
    print("Dataset display name: {}".format(dataset.display_name))
    metadata = dataset.tables_dataset_metadata
    print(
        "Dataset primary table spec id: {}".format(
            metadata.primary_table_spec_id
        )
    )
    print(
        "Dataset target column spec id: {}".format(
            metadata.target_column_spec_id
        )
    )
    print(
        "Dataset target column spec id: {}".format(
            metadata.target_column_spec_id
        )
    )
    print(
        "Dataset weight column spec id: {}".format(
            metadata.weight_column_spec_id
        )
    )
    print(
        "Dataset ml use column spec id: {}".format(
            metadata.ml_use_column_spec_id
        )
    )
    print("Dataset example count: {}".format(dataset.example_count))
    print("Dataset create time: {}".format(dataset.create_time))
    print("\n")

    # [END automl_tables_list_datasets]
    result.append(dataset)

return result

调用此函数时需要传递project_id和comupte_region。

在 jupyter notebook 中读取顶点 ai 数据集

Read vertex ai datasets in jupyter notebook

python

google-cloud-platform

google-cloud-vertex-ai