在 python AzureML 分类问题中删除重复项时出现错误
Getting the error while removing the duplicates in python AzureML classification problem
我在调用 drop.duplicate 函数时遇到此错误:
Traceback (most recent call last):
File "train.py", line 159, in <module>
orders_dfx = preprocess_orders(orders_df)
File "train.py", line 20, in preprocess_orders
ao = ao.drop_duplicates(subset=['order_id'], keep='last')
AttributeError: 'TabularDataset' object has no attribute 'drop_duplicates'
这是train.py
代码的一部分
def preprocess_orders(ao):
ao = ao.drop_duplicates(subset=['order_id'], keep='last')
ao['order_id'] = ao['order_id'].astype('str')
ao['class'] = ao['class'].astype('int')
ao['age'] = ao['age'].astype('float').fillna(ao['age'].mean()).round(2)
return ao
orders_df = Dataset.get_by_name(ws, name='class_cancelled_orders')
orders_df.to_pandas_dataframe()
# Doing processing
orders_dfx = preprocess_orders(orders_df)
我正在从 azureml studio 中的数据集中获取数据。 job.py 文件用于 运行 实验:
# submit job
run = Experiment(ws, experiment_name).submit(src)
run.wait_for_completion(show_output=True)
to_pandas_dataframe()
方法returns一个pandasDataFrame,所以你需要把它赋值回你的变量:
orders_df = orders_df.to_pandas_dataframe()
我在调用 drop.duplicate 函数时遇到此错误:
Traceback (most recent call last):
File "train.py", line 159, in <module>
orders_dfx = preprocess_orders(orders_df)
File "train.py", line 20, in preprocess_orders
ao = ao.drop_duplicates(subset=['order_id'], keep='last')
AttributeError: 'TabularDataset' object has no attribute 'drop_duplicates'
这是train.py
代码的一部分
def preprocess_orders(ao):
ao = ao.drop_duplicates(subset=['order_id'], keep='last')
ao['order_id'] = ao['order_id'].astype('str')
ao['class'] = ao['class'].astype('int')
ao['age'] = ao['age'].astype('float').fillna(ao['age'].mean()).round(2)
return ao
orders_df = Dataset.get_by_name(ws, name='class_cancelled_orders')
orders_df.to_pandas_dataframe()
# Doing processing
orders_dfx = preprocess_orders(orders_df)
我正在从 azureml studio 中的数据集中获取数据。 job.py 文件用于 运行 实验:
# submit job
run = Experiment(ws, experiment_name).submit(src)
run.wait_for_completion(show_output=True)
to_pandas_dataframe()
方法returns一个pandasDataFrame,所以你需要把它赋值回你的变量:
orders_df = orders_df.to_pandas_dataframe()