sklearn 有错误（LogisticRegression 模型选择）

Question

    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import StandardScaler


    Dt = pd.read_csv("D:\wisc_bc_data.csv")
    '''
    print(Dt.shape)     
    print(Dt.head())
    '''
     def changer(x):
         if x == 'B':
            return 0
         else:
            return 1
     Dt['diagnosis'] = Dt['diagnosis'].map(lambda x: changer(x))
     features = Dt[2:12]
     Diagnosis = Dt['diagnosis']
     train_features, test_features, train_labels, test_labels = train_test_split(features, Diagnosis) 'this line emits error code'

     '''
     this is my code and i used dataset from here: https://gomguard.tistory.com/52
     '''

我想拆分数据以进行逻辑回归。但是，出现了这样的错误代码：

ValueError Traceback（最后一次调用）在 ----> 1 train_features、test_features、train_labels、test_labels = train_test_split（特征、诊断）

D:\python\lib\site-packages\sklearn\model_selection_split.py in train_test_split(*arrays, **options) 2116 引发 TypeError（“传递的参数无效：%s” % str（选项）） 2117 -> 2118 个数组 = 可索引（*数组） 2119 2120 n_samples = _num_samples(数组[0])

D:\python\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables) 第246话 247 结果 = [_make_indexable(X) for X in iterables] --> 248 check_consistent_length(*结果) 249return 结果 250

D:\python\lib\site-packages\sklearn\utils\validation.py check_consistent_length(*数组) 210 如果 len(uniques) > 1: 211 raise ValueError("发现输入变量的数量不一致" --> 212 " samples: %r" % [int(l) for l in lengths]) 213 214

ValueError：发现样本数量不一致的输入变量：[10, 569] 我该如何解决？

Answer 1

我认为 features = Dt[2:12] 导致了您的错误。您的尝试是对要素进行切片，但 python 将代码解释为对记录进行切片。因此，将代码更改为 Dt.iloc[:, 2:12].

sklearn 有错误（LogisticRegression 模型选择）

There is an error with sklearn (LogisticRegression model selection)

python

pandas

scikit-learn

logistic-regression

我想拆分数据以进行逻辑回归。但是，出现了这样的错误代码：