Pandas :线性回归将标准缩放器应用于某些列

Pandas : Linear Regression apply standard scaler to some columns

所以我有以下数据集:

new_data=pd.read_csv('https://raw.githubusercontent.com/michalis0/DataMining_and_MachineLearning/master/data/regression_sales.csv')

然后我做了一个简单的线性回归:

y = np.array(new_data['sales_per_day'])
X = np.array(new_data[['number_orders', 'number_items', 'number_segments', 'year', 'month', 'day']])
X.shape, y.shape
train, test = train_test_split(df, test_size=0.2, train_size=0.8, random_state = 77)
from sklearn.linear_model import LinearRegression
regression_model = LinearRegression(fit_intercept=True)
regression_model.fit(X, y)

我现在想标准化 'number_orders', 'number_items', 'number_segments',这是我尝试过的:

from sklearn.preprocessing import StandardScaler
Std_Scaler = StandardScaler()
Std_data = Std_Scaler.fit_transform(X_train)
Std_data = pd.DataFrame(Std_Scaler.transform(X_test), columns=['number_items', 'number_orders', 'number_segments'])

但是我收到以下错误 ValueError: Wrong number of items passed 6, placement implies 3

问题是我只想标准化这三列和其他三列(year, month, day),但我似乎做不到。

你知道有什么方法可以只标准化部分数据集吗?

您可以像这样拆分数据框:

X = new_data[['number_orders', 'number_items', 'number_segments', 'year', 'month', 'day']]
y = new_data['sales_per_day']
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, 
train_size=0.8, random_state = 77)

定义要缩放的列:

Cols = ['number_items', 'number_orders', 'number_segments']

然后你需要制作一个副本,因为你正在修改数据框,所以像这样:

X_train = X_train.copy()
X_test = X_test.copy()
X_train[Cols] = Std_Scaler.fit_transform(X_train[Cols])
X_test[Cols] = Std_Scaler.fit_transform(X_test[Cols])