你好。我是 MLFlow 的新手,想在我自己的 ML 模型上实施 MLFlow 项目。但是我得到“"Could not find main among entry points"”
Hi. I am very new to MLFlow, and want to implement MLFlow project on my own ML model. However I am getting ""Could not find main among entry points""
完整的错误信息如下:
ERROR mlflow.cli: === Could not find main among entry points [] or interpret main as a runnable script. Supported script file extensions: ['.py', '.sh'] ===
我也尝试了这里建议的解决方案https://github.com/mlflow/mlflow/issues/1094
,但结果是一样的。
下面我将所有需要的文件提供给运行MLflow
项目。
conda.yaml
文件
name: lightgbm-example
channels:
- conda-forge
dependencies:
- python=3.6
- pip
- pip:
- mlflow>=1.6.0
- lightgbm
- pandas
- numpy
MLProject 文件
name: lightgbm-example
conda_env: ~/Desktop/MLflow/conda.yaml
entry-points:
main:
parameters:
learning_rate: {type: float, default: 0.1}
colsample_bytree: {type: float, default: 1.0}
subsample: {type: float, default: 1.0}
command: |
python3 ~/Desktop/MLflow/Test.py \
--learning-rate={learning_rate} \
--colsample-bytree={colsample_bytree} \
--subsample={subsample}
我的Test.py文件
import pandas as pd
import lightgbm as lgb
import numpy as np
import mlflow
import mlflow.lightgbm
import argparse
from sklearn.metrics import accuracy_score, confusion_matrix
def parse_args():
parser = argparse.ArgumentParser(description="LightGBM example")
parser.add_argument(
"--learning-rate",
type=float,
default=0.1,
help="learning rate to update step size at each boosting step (default: 0.3)",
)
parser.add_argument(
"--colsample-bytree",
type=float,
default=1.0,
help="subsample ratio of columns when constructing each tree (default: 1.0)",
)
parser.add_argument(
"--subsample",
type=float,
default=1.0,
help="subsample ratio of the training instances (default: 1.0)",
)
return parser.parse_args()
def find_specificity(c_matrix):
specificity = c_matrix[1][1]/(c_matrix[1][1]+c_matrix[0][1])
return specificity
def main():
args = parse_args()
df = pd.read_csv('~/Desktop/MLflow/Churn_demo.csv')
train_df = df.sample(frac=0.8, random_state=25)
test_df = df.drop(train_df.index)
train_df.drop(['subscriberid'], axis = 1, inplace = True)
test_df.drop(['subscriberid'], axis = 1, inplace = True)
TrainX = train_df.iloc[:,:-1]
TrainY = train_df.iloc[:,-1]
TestX = test_df.iloc[:,:-1]
TestY = test_df.iloc[:,-1]
mlflow.lightgbm.autolog()
dtrain = lgb.Dataset(TrainX, label=TrainY)
dtest = lgb.Dataset(TestX, label=TestY)
with mlflow.start_run():
parameters = {
'objective': 'binary',
'device':'cpu',
'num_threads': 6,
'num_leaves': 127,
'metric' : 'binary',
'lambda_l2':5,
'max_bin': 63,
'bin_construct_sample_cnt' :2*1000*1000,
'learning_rate': args.learning_rate,
'colsample_bytree': args.colsample_bytree,
'subsample': args.subsample,
'verbose': 1
}
model = lgb.train(parameters,
dtrain,
valid_sets=dtest,
num_boost_round=10000,
early_stopping_rounds=10)
y_proba=model.predict(TestX)
pred=np.where(y_proba>0.25,1,0)
conf_matrix = confusion_matrix(TestY,pred)
specificity = find_specificity(conf_matrix)
acc = accuracy_score(TestY,pred)
mlflow.log_metric({"specificity" : specificity, "accuracy" : acc})
if __name__ == "__main__":
main()
幸运的是,我的问题已经解决了。我列出了一些针对相同错误的解决方案,如果您将来遇到相同的问题,可以对您有所帮助。
- 文件名。文件名应与 MLFlow 文档
https://mlflow.org/
中建议的相同。例如不是 conda.yamp
,而是 conda.yaml
,因为 https://github.com/mlflow/mlflow/issues/3856
中存在这样的问题
conda.yaml
文件不支持Tab,请考虑改用spaces
- 在MLProject文件名中'P'在MLFlow 1.4之前应该是大写的。但是后来的版本并不重要,正如那里解释的那样
https://github.com/mlflow/mlflow/issues/1094
- (在我的例子中)MLProject 文件是 space 敏感的。让
https://github.com/mlflow/mlflow/tree/master/examples
GitHub 示例指导您。
完整的错误信息如下:
ERROR mlflow.cli: === Could not find main among entry points [] or interpret main as a runnable script. Supported script file extensions: ['.py', '.sh'] ===
我也尝试了这里建议的解决方案https://github.com/mlflow/mlflow/issues/1094
,但结果是一样的。
下面我将所有需要的文件提供给运行MLflow
项目。
conda.yaml
文件
name: lightgbm-example
channels:
- conda-forge
dependencies:
- python=3.6
- pip
- pip:
- mlflow>=1.6.0
- lightgbm
- pandas
- numpy
MLProject 文件
name: lightgbm-example
conda_env: ~/Desktop/MLflow/conda.yaml
entry-points:
main:
parameters:
learning_rate: {type: float, default: 0.1}
colsample_bytree: {type: float, default: 1.0}
subsample: {type: float, default: 1.0}
command: |
python3 ~/Desktop/MLflow/Test.py \
--learning-rate={learning_rate} \
--colsample-bytree={colsample_bytree} \
--subsample={subsample}
我的Test.py文件
import pandas as pd
import lightgbm as lgb
import numpy as np
import mlflow
import mlflow.lightgbm
import argparse
from sklearn.metrics import accuracy_score, confusion_matrix
def parse_args():
parser = argparse.ArgumentParser(description="LightGBM example")
parser.add_argument(
"--learning-rate",
type=float,
default=0.1,
help="learning rate to update step size at each boosting step (default: 0.3)",
)
parser.add_argument(
"--colsample-bytree",
type=float,
default=1.0,
help="subsample ratio of columns when constructing each tree (default: 1.0)",
)
parser.add_argument(
"--subsample",
type=float,
default=1.0,
help="subsample ratio of the training instances (default: 1.0)",
)
return parser.parse_args()
def find_specificity(c_matrix):
specificity = c_matrix[1][1]/(c_matrix[1][1]+c_matrix[0][1])
return specificity
def main():
args = parse_args()
df = pd.read_csv('~/Desktop/MLflow/Churn_demo.csv')
train_df = df.sample(frac=0.8, random_state=25)
test_df = df.drop(train_df.index)
train_df.drop(['subscriberid'], axis = 1, inplace = True)
test_df.drop(['subscriberid'], axis = 1, inplace = True)
TrainX = train_df.iloc[:,:-1]
TrainY = train_df.iloc[:,-1]
TestX = test_df.iloc[:,:-1]
TestY = test_df.iloc[:,-1]
mlflow.lightgbm.autolog()
dtrain = lgb.Dataset(TrainX, label=TrainY)
dtest = lgb.Dataset(TestX, label=TestY)
with mlflow.start_run():
parameters = {
'objective': 'binary',
'device':'cpu',
'num_threads': 6,
'num_leaves': 127,
'metric' : 'binary',
'lambda_l2':5,
'max_bin': 63,
'bin_construct_sample_cnt' :2*1000*1000,
'learning_rate': args.learning_rate,
'colsample_bytree': args.colsample_bytree,
'subsample': args.subsample,
'verbose': 1
}
model = lgb.train(parameters,
dtrain,
valid_sets=dtest,
num_boost_round=10000,
early_stopping_rounds=10)
y_proba=model.predict(TestX)
pred=np.where(y_proba>0.25,1,0)
conf_matrix = confusion_matrix(TestY,pred)
specificity = find_specificity(conf_matrix)
acc = accuracy_score(TestY,pred)
mlflow.log_metric({"specificity" : specificity, "accuracy" : acc})
if __name__ == "__main__":
main()
幸运的是,我的问题已经解决了。我列出了一些针对相同错误的解决方案,如果您将来遇到相同的问题,可以对您有所帮助。
- 文件名。文件名应与 MLFlow 文档
https://mlflow.org/
中建议的相同。例如不是conda.yamp
,而是conda.yaml
,因为https://github.com/mlflow/mlflow/issues/3856
中存在这样的问题
conda.yaml
文件不支持Tab,请考虑改用spaces- 在MLProject文件名中'P'在MLFlow 1.4之前应该是大写的。但是后来的版本并不重要,正如那里解释的那样
https://github.com/mlflow/mlflow/issues/1094
- (在我的例子中)MLProject 文件是 space 敏感的。让
https://github.com/mlflow/mlflow/tree/master/examples
GitHub 示例指导您。