如何在应用 MCA 等降维方法后降低新 data/input 的维度

Question

我有一个这样的分类训练集

col1   col2   col3   col4
 9      8      10     9
10      8       9     9
.....................

在我通过应用 MCA（多重对应分析）缩小尺寸后，我得到了这样的结果

dim1    dim2
0.857  -0.575
0.654   0.938
.............

现在我的问题是如何找到像这样的新数据的 (dim1, dim2) 作为输入？

col1  col2   col3  col4
10     9       8     8

MCA 在训练集上执行后的输出是特征值、惯性等

我的代码在python:

from sklearn.cluster import KMeans
import prince
data = pd.read_csv("data/training set.csv")
X = data.loc[:, 'OS.1':'DSA.1']
size = len(X)
X = X.values.tolist()

#...
#data preprocessing
#...

df = pd.DataFrame(X)
mca = prince.MCA(
               n_components=2,
               n_iter=3,
               copy=True,
               check_input=True,
               engine='auto',
               random_state=42
                )

mca = mca.fit(df)
X = mca.transform(df)

km = KMeans(n_clusters=3)
km.fit(X)

1.I 想听取用户的意见 2.Preprocess 在使用 MCA 进行降维之前 3.predict 它是使用 K 均值的聚类

Answer 1

您只需要让 MCA 对象 mca 保持活动状态，就可以使用它来转换新的输入数据。为此，只需调用 transform method on your new data

from sklearn.cluster import KMeans
import prince
data = pd.read_csv("data/training set.csv")
X = data.loc[:, 'OS.1':'DSA.1']
size = len(X)
X = X.values.tolist()

#...
#data preprocessing
#...

df = pd.DataFrame(X)
mca = prince.MCA(
               n_components=2,
               n_iter=3,
               copy=True,
               check_input=True,
               engine='auto',
               random_state=42
                )

mca = mca.fit(df)
X = mca.transform(df)

km = KMeans(n_clusters=3)
km.fit(X)

# New data into x_new
# 1. Preprocess x_new as you preprocessed x
# Reuse mca on x_new
df_new = pd.DataFrame(x_new)
X_new = mca.transform(df_new)

# predictions
km.predict(X_new)

如何在应用 MCA 等降维方法后降低新 data/input 的维度

How to reduce dimensions of new data/input after applying dimensionality reduction method like MCA

machine-learning

dimensionality-reduction