"The timestamp column must have valid timestamp entries." 在 `AutoMLTabularTrainingJob.run` 中使用 `timestamp_split_column_name` arg 时出错
"The timestamp column must have valid timestamp entries." error when using `timestamp_split_column_name` arg in `AutoMLTabularTrainingJob.run`
来自 the docs 它说
The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z)
我指向的数据集是云存储中的 CSV,其中数据采用文档建议的格式:
$ gsutil cat gs://my-data.csv | head | xsv select TS_SPLIT_COL
TS_SPLIT_COL
2021-01-18T00:00:00.00Z
2021-01-18T00:00:00.00Z
2021-01-04T00:00:00.00Z
2021-03-06T00:00:00.00Z
2021-01-15T00:00:00.00Z
2021-02-11T00:00:00.00Z
2021-02-05T00:00:00.00Z
2021-05-20T00:00:00.00Z
2021-01-05T00:00:00.00Z
但是当我尝试 运行 训练作业时收到 Training pipeline failed with error message: The timestamp column must have valid timestamp entries.
错误
编辑:这有望使其更具可重复性
数据:https://pastebin.com/qEDqvzX6
代码我是 运行ning:
from google.cloud import aiplatform
PROJECT = "my-project"
DATASET_ID = "dataset-id" # points to CSV
aiplatform.init(project=PROJECT)
dataset = aiplatform.TabularDataset(DATASET_ID)
job = aiplatform.AutoMLTabularTrainingJob(
display_name="so-58454722",
optimization_prediction_type="classification",
optimization_objective="maximize-au-roc",
)
model = job.run(
dataset=dataset,
model_display_name="so-58454722",
target_column="Y",
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
timestamp_split_column_name="TS_SPLIT_COL",
)
试试这个时间戳格式:
2022-03-18T01:23:45.123456+00:00
它使用 +00:00
而不是 Z
来指定时区。
此更改将消除“时间戳列必须具有有效的时间戳条目”。错误
来自 the docs 它说
The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z)
我指向的数据集是云存储中的 CSV,其中数据采用文档建议的格式:
$ gsutil cat gs://my-data.csv | head | xsv select TS_SPLIT_COL
TS_SPLIT_COL
2021-01-18T00:00:00.00Z
2021-01-18T00:00:00.00Z
2021-01-04T00:00:00.00Z
2021-03-06T00:00:00.00Z
2021-01-15T00:00:00.00Z
2021-02-11T00:00:00.00Z
2021-02-05T00:00:00.00Z
2021-05-20T00:00:00.00Z
2021-01-05T00:00:00.00Z
但是当我尝试 运行 训练作业时收到 Training pipeline failed with error message: The timestamp column must have valid timestamp entries.
错误
编辑:这有望使其更具可重复性
数据:https://pastebin.com/qEDqvzX6
代码我是 运行ning:
from google.cloud import aiplatform
PROJECT = "my-project"
DATASET_ID = "dataset-id" # points to CSV
aiplatform.init(project=PROJECT)
dataset = aiplatform.TabularDataset(DATASET_ID)
job = aiplatform.AutoMLTabularTrainingJob(
display_name="so-58454722",
optimization_prediction_type="classification",
optimization_objective="maximize-au-roc",
)
model = job.run(
dataset=dataset,
model_display_name="so-58454722",
target_column="Y",
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
timestamp_split_column_name="TS_SPLIT_COL",
)
试试这个时间戳格式:
2022-03-18T01:23:45.123456+00:00
它使用 +00:00
而不是 Z
来指定时区。
此更改将消除“时间戳列必须具有有效的时间戳条目”。错误