在 LDA 中应用 fit_transform 时输入形状错误
Bad input shape while applying fit_transform in LDA
我在我的数据集上应用了 get_dummies()
方法,当我尝试应用 LDA 的 fit_transform()
方法时,为了训练和测试目的分割数据集:
ValueError: bad input shape (26905, 8)
我做错了什么?我不确定问题是由于 get_dummies()
方法还是我遗漏的其他问题
# Sample Code
df = pd.read_csv('/Users/rushirajparmar/Downloads/Problem 16 (1)/Problem 16/Problem 16/train_file.csv')
df.drop(['UsageClass','CheckoutType','CheckoutYear','CheckoutMonth'],axis = 1,inplace = True)
Y=pd.get_dummies(df,columns = ['MaterialType'])
X=pd.get_dummies(df,columns = ['Title','Creator','Subjects','Publisher','PublicationYear'])
X.drop(['MaterialType'],axis = 1,inplace = True)
Y.drop(['ID','Checkouts','Title','Creator','Subjects','Publisher','PublicationYear'],axis = 1,inplace = True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.15)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 1)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)
数据集:
这里是train_file.csv供参考
您不必对目标变量应用 get_dummies。您可以直接将 multi-class 标签提供给 LDA
.
fit_transform(X, y=None, **fit_params)
Fit to data, then
transform it.
Fits transformer to X and y with optional parameters fit_params and
returns a transformed version of X.
Parameters:
X : numpy array of shape [n_samples, n_features] Training
set.
y : numpy array of shape [n_samples] Target values.
Returns: X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
因此,您的 y
必须是一维的。
X_train, X_test, y_train, y_test = train_test_split(X, df['MaterialType'], test_size = 0.15)
lda = LDA(n_components = 1)
X_train = lda.fit_transform(X_train, y_train)
我在我的数据集上应用了 get_dummies()
方法,当我尝试应用 LDA 的 fit_transform()
方法时,为了训练和测试目的分割数据集:
ValueError: bad input shape (26905, 8)
我做错了什么?我不确定问题是由于 get_dummies()
方法还是我遗漏的其他问题
# Sample Code
df = pd.read_csv('/Users/rushirajparmar/Downloads/Problem 16 (1)/Problem 16/Problem 16/train_file.csv')
df.drop(['UsageClass','CheckoutType','CheckoutYear','CheckoutMonth'],axis = 1,inplace = True)
Y=pd.get_dummies(df,columns = ['MaterialType'])
X=pd.get_dummies(df,columns = ['Title','Creator','Subjects','Publisher','PublicationYear'])
X.drop(['MaterialType'],axis = 1,inplace = True)
Y.drop(['ID','Checkouts','Title','Creator','Subjects','Publisher','PublicationYear'],axis = 1,inplace = True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.15)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 1)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)
数据集:
这里是train_file.csv供参考
您不必对目标变量应用 get_dummies。您可以直接将 multi-class 标签提供给 LDA
.
fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters:
X : numpy array of shape [n_samples, n_features] Training set.y : numpy array of shape [n_samples] Target values.
Returns: X_new : numpy array of shape [n_samples, n_features_new] Transformed array.
因此,您的 y
必须是一维的。
X_train, X_test, y_train, y_test = train_test_split(X, df['MaterialType'], test_size = 0.15)
lda = LDA(n_components = 1)
X_train = lda.fit_transform(X_train, y_train)