在 Python 中的 Khan 基因数据集上拟合 SVC 模型
Fitting SVC model on Khan Gene Data Set in Python
我有四个 .csv
文件,其中包含训练数据(数据点及其 类)以及已存储到 [= 14=、y_train
、X_test
和 y_test
变量。
我需要训练一个 CSV 模型并使用测试数据对其进行测试,当 sklearn.svm.SVC
获取 numpy
数组作为输入时,我尝试将 pandas
数据帧转换为 numoy
数组如下:
X_train_gene = pd.read_csv("Khan_xtrain.csv").drop('Unnamed: 0', axis=1).values.ravel()
y_train_gene = pd.read_csv("Khan_ytrain.csv").drop('Unnamed: 0', axis=1).values.ravel()
X_test_gene = pd.read_csv("Khan_xtest.csv").drop('Unnamed: 0', axis=1).values.ravel()
y_test_gene = pd.read_csv("Khan_ytest.csv").drop('Unnamed: 0', axis=1).values.ravel()
然后我尝试了以下几行代码来训练我的模型:
from sklearn.svm import SVC
svm_gene = SVC(C=10, kernel='linear')
svm_gene.fit(X_train_gene, y_train_gene)
但是我得到一个值错误:
ValueError: Expected 2D array, got 1D array instead:
array=[ 0.7733437 -2.438405 -0.4825622 ... -1.115962 -0.7837286 -1.339411 ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
完整的错误信息如下图所示:
我该如何解决这个问题?
我使用以下代码行解决了这个问题:
X_train_gene = pd.read_csv("file_path/file.csv", header=None)
#deleting the first column which contains numbers
del X_train_gene[X_train_gene.columns[0]]
#deleting the first row which contains strings and predictors' names
X_train_gene = X_train_gene.iloc[1:]
y_train_gene = pd.read_csv("file_path/file.csv", header=None)
#deleting the first column which contains numbers
del y_train_gene[y_train_gene.columns[0]]
#deleting the first row which contains strings and predictors' names
y_train_gene = y_train_gene.iloc[1:]
X_test_gene = pd.read_csv("file_path/file.csv", header=None)
#deleting the first column which contains numbers
del X_test_gene[X_test_gene.columns[0]]
#deleting the first row which contains strings and predictors' names
X_test_gene = X_test_gene.iloc[1:]
y_test_gene = pd.read_csv("file_path/file.csv", header=None)
#deleting the first column which contains numbers
del y_test_gene[y_test_gene.columns[0]]
#deleting the first row which contains strings and predictors' names
y_test_gene = y_test_gene.iloc[1:]
#Converting the pandas data frames to numpy arrays as the SVC functions accepts numpy arrays as input.
X_train_gene = X_train_gene.values
y_train_gene = y_train_gene.values.ravel()
X_test_gene = X_test_gene.values
y_test_gene = y_test_gene.values.ravel()
我有四个 .csv
文件,其中包含训练数据(数据点及其 类)以及已存储到 [= 14=、y_train
、X_test
和 y_test
变量。
我需要训练一个 CSV 模型并使用测试数据对其进行测试,当 sklearn.svm.SVC
获取 numpy
数组作为输入时,我尝试将 pandas
数据帧转换为 numoy
数组如下:
X_train_gene = pd.read_csv("Khan_xtrain.csv").drop('Unnamed: 0', axis=1).values.ravel()
y_train_gene = pd.read_csv("Khan_ytrain.csv").drop('Unnamed: 0', axis=1).values.ravel()
X_test_gene = pd.read_csv("Khan_xtest.csv").drop('Unnamed: 0', axis=1).values.ravel()
y_test_gene = pd.read_csv("Khan_ytest.csv").drop('Unnamed: 0', axis=1).values.ravel()
然后我尝试了以下几行代码来训练我的模型:
from sklearn.svm import SVC
svm_gene = SVC(C=10, kernel='linear')
svm_gene.fit(X_train_gene, y_train_gene)
但是我得到一个值错误:
ValueError: Expected 2D array, got 1D array instead: array=[ 0.7733437 -2.438405 -0.4825622 ... -1.115962 -0.7837286 -1.339411 ]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
完整的错误信息如下图所示:
我该如何解决这个问题?
我使用以下代码行解决了这个问题:
X_train_gene = pd.read_csv("file_path/file.csv", header=None)
#deleting the first column which contains numbers
del X_train_gene[X_train_gene.columns[0]]
#deleting the first row which contains strings and predictors' names
X_train_gene = X_train_gene.iloc[1:]
y_train_gene = pd.read_csv("file_path/file.csv", header=None)
#deleting the first column which contains numbers
del y_train_gene[y_train_gene.columns[0]]
#deleting the first row which contains strings and predictors' names
y_train_gene = y_train_gene.iloc[1:]
X_test_gene = pd.read_csv("file_path/file.csv", header=None)
#deleting the first column which contains numbers
del X_test_gene[X_test_gene.columns[0]]
#deleting the first row which contains strings and predictors' names
X_test_gene = X_test_gene.iloc[1:]
y_test_gene = pd.read_csv("file_path/file.csv", header=None)
#deleting the first column which contains numbers
del y_test_gene[y_test_gene.columns[0]]
#deleting the first row which contains strings and predictors' names
y_test_gene = y_test_gene.iloc[1:]
#Converting the pandas data frames to numpy arrays as the SVC functions accepts numpy arrays as input.
X_train_gene = X_train_gene.values
y_train_gene = y_train_gene.values.ravel()
X_test_gene = X_test_gene.values
y_test_gene = y_test_gene.values.ravel()