"The timestamp column must have valid timestamp entries." 在 `AutoMLTabularTrainingJob.run` 中使用 `timestamp_split_column_name` arg 时出错

"The timestamp column must have valid timestamp entries." error when using `timestamp_split_column_name` arg in `AutoMLTabularTrainingJob.run`

来自 the docs 它说

The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z)

我指向的数据集是云存储中的 CSV,其中数据采用文档建议的格式:

$ gsutil cat gs://my-data.csv | head | xsv select TS_SPLIT_COL
TS_SPLIT_COL
2021-01-18T00:00:00.00Z
2021-01-18T00:00:00.00Z
2021-01-04T00:00:00.00Z
2021-03-06T00:00:00.00Z
2021-01-15T00:00:00.00Z
2021-02-11T00:00:00.00Z
2021-02-05T00:00:00.00Z
2021-05-20T00:00:00.00Z
2021-01-05T00:00:00.00Z

但是当我尝试 运行 训练作业时收到 Training pipeline failed with error message: The timestamp column must have valid timestamp entries. 错误

编辑:这有望使其更具可重复性

数据:https://pastebin.com/qEDqvzX6

代码我是 运行ning:

from google.cloud import aiplatform

PROJECT = "my-project"
DATASET_ID = "dataset-id"  # points to CSV 

aiplatform.init(project=PROJECT)

dataset = aiplatform.TabularDataset(DATASET_ID)

job = aiplatform.AutoMLTabularTrainingJob(
    display_name="so-58454722",
    optimization_prediction_type="classification",
    optimization_objective="maximize-au-roc",
)

model = job.run(
    dataset=dataset,
    model_display_name="so-58454722",
    target_column="Y",
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    timestamp_split_column_name="TS_SPLIT_COL",
)

试试这个时间戳格式:

2022-03-18T01:23:45.123456+00:00

它使用 +00:00 而不是 Z 来指定时区。

此更改将消除“时间戳列必须具有有效的时间戳条目”。错误