Python 中 DataFrame 的循环逻辑回归

Question

我被困在这个循环的哪里出了问题，以对具有 25 个特征的数据帧执行逻辑回归。

当我重塑它时出现错误： “ValueError：预期的二维数组，取而代之的是一维数组：数组=[-12.36677125 -12.91946925 -12.89317629 -13.16951215 -12.20588875 -12.44694704 -12.71370778 -12.69351738 -12.89451587 -12.0776727 -12.63723271 -13.39461116 -12.52027792]。如果您的数据具有单个特征，则使用 array.reshape(-1, 1) 重塑您的数据，如果它包含单个样本，则使用 array.reshape(1, -1)。“

peptides = ['AYSLFSYNTQGR','IVLGQEQDSYGGK','EQLTPLIK','SPELQAEAK','SPELQAEAK','ALVQQMEQLR','SGVQQLIQYYQDQK','VVVHPDYR','GFVVAGPSR','CLCACPFK','VVEESELAR','FCDMPVFENSR','GYSIFSYATK',
'EPGCGCCSVCAR',
'LIQGAPTIR',
'YYLQGAK',
'ALGHLDLSGNR',
'DLLLPQPDLR',
'GPLQLER',
'IISIMDEK',
'LQDAEIAR',
'QINDYVEK',
'SVLGQLGITK',
'ADLSGITGAR',
'EQLSLLDR']

这是我想要交互的肽列表。它们应该是 X_train.

的列标题

LR_scores = []
logit_roc_auc =[]
y_pred = []
acc_score = []

for peptide in peptides:
    model=LogisticRegression()
    model.fit(X_train[peptide], y_train)
    score = model.score(X_test[peptide], y_test)
    y_pred=model.predict(X_test[peptide])
    acc_score = accuracy_score(y_test, y_pred)
    LR_scores.append(peptide,acc_score)
    
    #Classification Report
    print (classification_report(y_test,y_pred))
    
    #Confusion Matrix
    cnf_matrix = confusion_matrix(y_test,y_pred)
    print(cnf_matrix)
    
    #ROC_AUC Curves
    y_predict_proba = model.predict_proba(X_test[peptide])
    probabilities = np.array(y_predict_proba)[:, 1]
    fpr, tpr, thresholds = roc_curve(y_test, probabilities, pos_label=1)
    roc_auc = auc(fpr, tpr)
    logit_roc_auc = roc_auc_score(y_test, model.predict(X_test[peptide]))

感谢任何帮助。

Screenshot of Jupyter Notebook

This loop works with different input lists

Answer 1

在拟合模型时 X 应为二维数组，y 为一维数组。

X_train[肽] returns 一个系列，它是一维数组。您可以 -

X_train[peptide].shape
#Output  = (nrows,)

你可以这样做 -

X_train[[peptide]].shape
#Output = (nrows,1)

或

X_train[peptide].to_numpy().reshape(-1,1)
#Output = (nrows,1)

这应该有效 -

如果还有另一个错误，则代码存在不止一个问题。请post评论中的错误。

for peptide in peptides:
    model=LogisticRegression()
    model.fit(X_train[[peptide]], y_train)
    score = model.score(X_test[[peptide]], y_test)
    y_pred=model.predict(X_test[[peptide]])
    acc_score = accuracy_score(y_test, y_pred)
    LR_scores.append(peptide,acc_score)
    
    #Classification Report
    print (classification_report(y_test,y_pred))
    
    #Confusion Matrix
    cnf_matrix = confusion_matrix(y_test,y_pred)
    print(cnf_matrix)
    
    #ROC_AUC Curves
    y_predict_proba = model.predict_proba(X_test[[peptide]])
    probabilities = np.array(y_predict_proba)[:, 1]
    fpr, tpr, thresholds = roc_curve(y_test, probabilities, pos_label=1)
    roc_auc = auc(fpr, tpr)
    logit_roc_auc = roc_auc_score(y_test, model.predict(X_test[[peptide]]))

Python 中 DataFrame 的循环逻辑回归

Looping Logistic Regression over DataFrame in Python

loops

numpy

machine-learning

pandas

logistic-regression