sklearn 有错误(LogisticRegression 模型选择)
There is an error with sklearn (LogisticRegression model selection)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
Dt = pd.read_csv("D:\wisc_bc_data.csv")
'''
print(Dt.shape)
print(Dt.head())
'''
def changer(x):
if x == 'B':
return 0
else:
return 1
Dt['diagnosis'] = Dt['diagnosis'].map(lambda x: changer(x))
features = Dt[2:12]
Diagnosis = Dt['diagnosis']
train_features, test_features, train_labels, test_labels = train_test_split(features, Diagnosis) 'this line emits error code'
'''
this is my code and i used dataset from here: https://gomguard.tistory.com/52
'''
我想拆分数据以进行逻辑回归。但是,出现了这样的错误代码:
ValueError Traceback(最后一次调用)
在
----> 1 train_features、test_features、train_labels、test_labels = train_test_split(特征、诊断)
D:\python\lib\site-packages\sklearn\model_selection_split.py in train_test_split(*arrays, **options)
2116 引发 TypeError(“传递的参数无效:%s” % str(选项))
2117
-> 2118 个数组 = 可索引(*数组)
2119
2120 n_samples = _num_samples(数组[0])
D:\python\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
第246话
247 结果 = [_make_indexable(X) for X in iterables]
--> 248 check_consistent_length(*结果)
249return 结果
250
D:\python\lib\site-packages\sklearn\utils\validation.py check_consistent_length(*数组)
210 如果 len(uniques) > 1:
211 raise ValueError("发现输入变量的数量不一致"
--> 212 " samples: %r" % [int(l) for l in lengths])
213
214
ValueError:发现样本数量不一致的输入变量:[10, 569]
我该如何解决?
我认为 features = Dt[2:12]
导致了您的错误。
您的尝试是对要素进行切片,但 python 将代码解释为对记录进行切片。
因此,将代码更改为 Dt.iloc[:, 2:12
].
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
Dt = pd.read_csv("D:\wisc_bc_data.csv")
'''
print(Dt.shape)
print(Dt.head())
'''
def changer(x):
if x == 'B':
return 0
else:
return 1
Dt['diagnosis'] = Dt['diagnosis'].map(lambda x: changer(x))
features = Dt[2:12]
Diagnosis = Dt['diagnosis']
train_features, test_features, train_labels, test_labels = train_test_split(features, Diagnosis) 'this line emits error code'
'''
this is my code and i used dataset from here: https://gomguard.tistory.com/52
'''
我想拆分数据以进行逻辑回归。但是,出现了这样的错误代码:
ValueError Traceback(最后一次调用) 在 ----> 1 train_features、test_features、train_labels、test_labels = train_test_split(特征、诊断)
D:\python\lib\site-packages\sklearn\model_selection_split.py in train_test_split(*arrays, **options) 2116 引发 TypeError(“传递的参数无效:%s” % str(选项)) 2117 -> 2118 个数组 = 可索引(*数组) 2119 2120 n_samples = _num_samples(数组[0])
D:\python\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables) 第246话 247 结果 = [_make_indexable(X) for X in iterables] --> 248 check_consistent_length(*结果) 249return 结果 250
D:\python\lib\site-packages\sklearn\utils\validation.py check_consistent_length(*数组) 210 如果 len(uniques) > 1: 211 raise ValueError("发现输入变量的数量不一致" --> 212 " samples: %r" % [int(l) for l in lengths]) 213 214
ValueError:发现样本数量不一致的输入变量:[10, 569] 我该如何解决?
我认为 features = Dt[2:12]
导致了您的错误。
您的尝试是对要素进行切片,但 python 将代码解释为对记录进行切片。
因此,将代码更改为 Dt.iloc[:, 2:12
].