错误 InvalidInputDatatype:天蓝色不支持 'Unknown' 类型的输入 (azureml.train.automl)
Erro InvalidInputDatatype: Input of type 'Unknown' is not supported in azure (azureml.train.automl)
我有一个 pandas 的 DataFrame 创建者:
TB_HISTORICO_MODELO = pd.read_sql("""select DAT_INICIO_SEMANA_PLAN
,COD_NEGOCIO
,VENDA
,LUCRO
,MODULADO
,RUPTURA
,QTD_ESTOQUE_MEDIO
,PECAS from TB""", cursor)
TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"] = pd.to_datetime(TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"])
dataset = TB_HISTORICO_MODELO[TB_HISTORICO_MODELO['COD_NEGOCIO']=='A101'].drop(columns=['COD_NEGOCIO']) .reset_index(drop=True)
一切看起来都不错。
>>> dataset.dtypes
DAT_INICIO_SEMANA_PLAN datetime64[ns]
VENDA float64
LUCRO float64
MODULADO int64
RUPTURA int64
QTD_ESTOQUE_MEDIO int64
PECAS float64
dtype: object
但是当我朗姆酒时:
#%% Create the AutoML Config file and run the experiment on Azure
from azureml.train.automl import AutoMLConfig
time_series_settings = {
'time_column_name': 'DAT_INICIO_SEMANA_PLAN',
'max_horizon': 14,
'country_or_region': 'BR',
'target_lags': 'auto'
}
automl_config = AutoMLConfig(task='forecasting',
primary_metric='normalized_root_mean_squared_error',
blocked_models=['ExtremeRandomTrees'],
experiment_timeout_minutes=30,
training_data=dataset,
label_column_name='VENDA',
compute_target = compute_cluster,
enable_early_stopping=True,
n_cross_validations=3,
# max_concurrent_iterations=4,
# max_cores_per_iteration=-1,
verbosity=logging.INFO,
**time_series_settings)
remote_run = Experimento.submit(automl_config, show_output=True)
我收到消息
>>> remote_run = Experimento.submit(automl_config, show_output=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/core/experiment.py", line 219, in submit
run = submit_func(config, self.workspace, self.name, **kwargs)
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 92, in _automl_static_submit
automl_config_object._validate_config_settings(workspace)
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 1775, in _validate_config_settings
supported_types=", ".join(SupportedInputDatatypes.REMOTE_RUN_SCENARIO)
azureml.train.automl.exceptions.ConfigException: ConfigException:
Message: Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]
InnerException: None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]",
"details_uri": "https://aka.ms/AutoMLConfig",
"target": "training_data",
"inner_error": {
"code": "BadArgument",
"inner_error": {
"code": "ArgumentInvalid",
"inner_error": {
"code": "InvalidInputDatatype"
}
}
}
}
}
哪里错了?
文档:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train
https://docs.microsoft.com/pt-br/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig
For remote experiments, training data must be accessible from the remote compute. AutoML only accepts Azure Machine Learning TabularDatasets when working on a remote compute.
看起来您的 dataset
对象是一个 Pandas DataFrame,而实际上它应该是一个 Azure ML Dataset
。查看 this doc 创建数据集。
我有一个 pandas 的 DataFrame 创建者:
TB_HISTORICO_MODELO = pd.read_sql("""select DAT_INICIO_SEMANA_PLAN
,COD_NEGOCIO
,VENDA
,LUCRO
,MODULADO
,RUPTURA
,QTD_ESTOQUE_MEDIO
,PECAS from TB""", cursor)
TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"] = pd.to_datetime(TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"])
dataset = TB_HISTORICO_MODELO[TB_HISTORICO_MODELO['COD_NEGOCIO']=='A101'].drop(columns=['COD_NEGOCIO']) .reset_index(drop=True)
一切看起来都不错。
>>> dataset.dtypes
DAT_INICIO_SEMANA_PLAN datetime64[ns]
VENDA float64
LUCRO float64
MODULADO int64
RUPTURA int64
QTD_ESTOQUE_MEDIO int64
PECAS float64
dtype: object
但是当我朗姆酒时:
#%% Create the AutoML Config file and run the experiment on Azure
from azureml.train.automl import AutoMLConfig
time_series_settings = {
'time_column_name': 'DAT_INICIO_SEMANA_PLAN',
'max_horizon': 14,
'country_or_region': 'BR',
'target_lags': 'auto'
}
automl_config = AutoMLConfig(task='forecasting',
primary_metric='normalized_root_mean_squared_error',
blocked_models=['ExtremeRandomTrees'],
experiment_timeout_minutes=30,
training_data=dataset,
label_column_name='VENDA',
compute_target = compute_cluster,
enable_early_stopping=True,
n_cross_validations=3,
# max_concurrent_iterations=4,
# max_cores_per_iteration=-1,
verbosity=logging.INFO,
**time_series_settings)
remote_run = Experimento.submit(automl_config, show_output=True)
我收到消息
>>> remote_run = Experimento.submit(automl_config, show_output=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/core/experiment.py", line 219, in submit
run = submit_func(config, self.workspace, self.name, **kwargs)
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 92, in _automl_static_submit
automl_config_object._validate_config_settings(workspace)
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 1775, in _validate_config_settings
supported_types=", ".join(SupportedInputDatatypes.REMOTE_RUN_SCENARIO)
azureml.train.automl.exceptions.ConfigException: ConfigException:
Message: Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]
InnerException: None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]",
"details_uri": "https://aka.ms/AutoMLConfig",
"target": "training_data",
"inner_error": {
"code": "BadArgument",
"inner_error": {
"code": "ArgumentInvalid",
"inner_error": {
"code": "InvalidInputDatatype"
}
}
}
}
}
哪里错了?
文档: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train https://docs.microsoft.com/pt-br/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig
For remote experiments, training data must be accessible from the remote compute. AutoML only accepts Azure Machine Learning TabularDatasets when working on a remote compute.
看起来您的 dataset
对象是一个 Pandas DataFrame,而实际上它应该是一个 Azure ML Dataset
。查看 this doc 创建数据集。