如何使用 python 将数据正确输出到 Azure ML Batch Endpoint？

Question

调用 Azure ML Batch 端点（创建用于推理的作业）时，运行() 方法应该 return 一个 pandas DataFrame 或一个数组，如 here 所述

然而，此示例显示的并不代表 headers 的 csv 输出，因为通常需要它。

我尝试的第一件事是 return 将数据作为 pandas DataFrame，结果只是一个简单的 csv，只有一个列并且没有 headers.

当尝试传递具有多个列的值及其对应的 headers，以便稍后保存为 csv，结果，我得到了笨拙的方括号（代表 [=25= 中的列表]) 和撇号（代表字符串）

我无法在其他地方找到文档来解决这个问题：

Answer 1

这是我发现使用 python 从 AzureML 中的批处理端点调用创建 csv 格式的干净输出的方法：

def run(mini_batch):
    batch = []
    for file_path in mini_batch:
        df = pd.read_csv(file_path)
        
        # Do any data quality verification here:
        if 'id' not in df.columns:
            logger.error("ERROR: CSV file uploaded without id column")
            return None
        else:
            df['id'] = df['id'].astype(str)

        # Now we need to create the predictions, with previously loaded model in init():
        df['prediction'] = model.predict(df)
        # or alternative, df[MULTILABEL_LIST] = model.predict(df)

        batch.append(df)

    batch_df = pd.concat(batch)

    # After joining all data, we create the columns headers as a string,
    # here we remove the square brackets and apostrophes:
    azureml_columns = str(batch_df.columns.tolist())[1:-1].replace('\'','')
    result = []
    result.append(azureml_columns)

    # Now we have to parse all values as strings, row by row, 
    # adding a comma between each value
    for row in batch_df.iterrows():
        azureml_row = str(row[1].values).replace(' ', ',')[1:-1].replace('\'','').replace('\n','')
        result.append(azureml_row)

    logger.info("Finished Run")
    return result

如何使用 python 将数据正确输出到 Azure ML Batch Endpoint？

How to output data to Azure ML Batch Endpoint correctly using python?

azure

batch-processing

azure-machine-learning-service

azureml

azureml-python-sdk