如何查找 MinMaxScaler 对象中的行数和列数?
How to find the number of rows and columns in a MinMaxScaler object?
我制作了一个 csv 文件的数据帧并将其传递给 train_test_split,然后使用 MinMaxScaler 缩放整个 X 和 Y 数据帧,但现在我想知道基本的行数和列数,但是不能' t.
df=pd.read_csv("cancer_classification.csv")
from sklearn.model_selection import train_test_split
X = df.drop("benign_0__mal_1",axis=1).values
y = df["benign_0__mal_1"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit(X_train)
X_test = scaler.fit(X_test)
X_train.shape
这是抛出以下错误
AttributeError Traceback (most recent call last)
in ()
----> 1 X_train.shape
AttributeError: 'MinMaxScaler' object has no attribute 'shape'
我阅读了文档并且能够使用 scale_ 找到行数但找不到列。
这就是答案的样子,但我找不到可以提供帮助的属性
MinMaxScaler 是一个对象,可以 fit
自身对某些数据以及 transform
该数据。有
fit
方法使缩放器的参数适合该数据。然后 returns MinMaxScaler 对象
transforms
方法根据缩放器的拟合参数转换数据。然后 returns 转换后的数据。
fit_transform
方法首先使缩放器适合该数据,然后对其进行转换,returns 数据的转换版本。
在您的示例中,您将 MinMaxScaler 对象本身视为数据! (见第一个要点)
同一个 MinMaxScaler 不应在不同的数据集上安装两次,因为它的内部值会发生变化。你永远不应该在测试数据集上使用 minmaxscaler,因为这是将测试数据泄漏到你的模型中的一种方式。你应该做的是 fit_transform()
训练数据和 transform()
测试数据。
这里的答案也可能有助于解释:
When you call StandardScaler.fit(X_train), what it does is calculate the mean and variance from the values in X_train. Then calling .transform() will transform all of the features by subtracting the mean and dividing by the variance. For convenience, these two function calls can be done in one step using fit_transform().
The reason you want to fit the scaler using only the training data is because you don't want to bias your model with information from the test data.
If you fit() to your test data, you'd compute a new mean and variance for each feature. In theory these values may be very similar if your test and train sets have the same distribution, but in practice this is typically not the case.
Instead, you want to only transform the test data by using the parameters computed on the training data.
我制作了一个 csv 文件的数据帧并将其传递给 train_test_split,然后使用 MinMaxScaler 缩放整个 X 和 Y 数据帧,但现在我想知道基本的行数和列数,但是不能' t.
df=pd.read_csv("cancer_classification.csv")
from sklearn.model_selection import train_test_split
X = df.drop("benign_0__mal_1",axis=1).values
y = df["benign_0__mal_1"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit(X_train)
X_test = scaler.fit(X_test)
X_train.shape
这是抛出以下错误
AttributeError Traceback (most recent call last) in () ----> 1 X_train.shape
AttributeError: 'MinMaxScaler' object has no attribute 'shape'
我阅读了文档并且能够使用 scale_ 找到行数但找不到列。
这就是答案的样子,但我找不到可以提供帮助的属性
MinMaxScaler 是一个对象,可以 fit
自身对某些数据以及 transform
该数据。有
fit
方法使缩放器的参数适合该数据。然后 returns MinMaxScaler 对象transforms
方法根据缩放器的拟合参数转换数据。然后 returns 转换后的数据。fit_transform
方法首先使缩放器适合该数据,然后对其进行转换,returns 数据的转换版本。
在您的示例中,您将 MinMaxScaler 对象本身视为数据! (见第一个要点)
同一个 MinMaxScaler 不应在不同的数据集上安装两次,因为它的内部值会发生变化。你永远不应该在测试数据集上使用 minmaxscaler,因为这是将测试数据泄漏到你的模型中的一种方式。你应该做的是 fit_transform()
训练数据和 transform()
测试数据。
这里的答案也可能有助于解释:
When you call StandardScaler.fit(X_train), what it does is calculate the mean and variance from the values in X_train. Then calling .transform() will transform all of the features by subtracting the mean and dividing by the variance. For convenience, these two function calls can be done in one step using fit_transform().
The reason you want to fit the scaler using only the training data is because you don't want to bias your model with information from the test data.
If you fit() to your test data, you'd compute a new mean and variance for each feature. In theory these values may be very similar if your test and train sets have the same distribution, but in practice this is typically not the case.
Instead, you want to only transform the test data by using the parameters computed on the training data.