拟合训练集和测试集时出错,train_test_split 方法

Error while fitting train and test sets, train_test_split method

我正在尝试用 train_test_split 评估我的模型。我定义了以下函数以根据函数中的输入在 table(顶列)上创建输出数组:

def top_sh(num):
    ###Get the top(num) in Shanghai data and arrange
    ####input and output variables accordingly
    #Add column to be output value, either zero or one

    #shanghai = shanghai_cp.copy()
    if 'top' in shanghai.columns:
        shanghai.drop(columns = shanghai.columns[-1],inplace = True) 

    shanghai['top'] = shanghai['world_rank'].apply(lambda x: 1 if x<= num else 0)
    out = print('*****************'+ '\n' + 'Output array: Top'+ str(num)+ '\n' + 'Disregarding in Analysis: World rank')
    #call = print(shanghai.head(15))

    return out

然后我定义了train test split的流程如下:

def train_test(df,size, seed):
    ###Split the data into test and train sets and test

    #Get input output of df
    if df == 'shanghai':
        column1 = shanghai.columns[1:7]
        Y = shanghai.values[: , -1].astype(int)
        y = np.ravel(Y)
        X = shanghai.values[: , 1:7]
    elif df == 'times':
        column1 = times.columns[1:10]
        Y = times.values[: , -1].astype(int)
        y = np.ravel(Y)
        X = times.values[: , 1:10]
    else:
        return print('Available Datasets: "shanghai" , "times"')

    #Split into train and test
    X_Train, X_Test, Y_Train, Y_Test = train_test_split(X,Y, test_size=size, random_state=seed)

    #Get the regression
    model= LogisticRegression(solver='liblinear')
    model.fit(X_Train,X_Test)

    #See how accurately it is with the split
    result=model.score(X_Test,Y_Test)

    print(f'Accuaracy {result*100:5.3f}')

    return

我运行下面的代码:

top_sh(50)
shanghai.head()
X.shape
Y
Y.shape
train_test('shanghai',0.3,7)
```

X.shape = (768, 8)
Y.shape = (768, )

I get the following error on train_test function, specifically on model.fit line:

> ValueError: bad input shape (150, 6)

问题很可能是由您传递给 fit 的内容引起的。它期望 X 值作为预测变量,Y 值作为预测值,因此您这一行是不正确的:

model.fit(X_Train,X_Test)

您应该改为尝试传递 Y_train:

model.fit(X_train,Y_train)