聚类后​​如何进行逆变换

How to transform inverse after clustering

我想在使用 MinMaxScaler 对缩放数据集进行 K 均值聚类后恢复我的数据, 这是我的代码示例

copy_df=scaled_df.copy()
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(features)
copy_df['Cluster'] = kmeans.predict(features)

定标器已保存; 我试过类似的东西:x = scaler.inverse_transform(x)

我的 copy_df 应该比我的 scaled_df 多一列(簇号)

我想这就是我得到的原因:

ValueError: operands could not be broadcast together with shapes (3,5) (4,) (3,5) 

如何恢复我的数据?

我需要获取集群的真实数据或每个特征的平均值。

MinMaxScaler() 预期的形状(基于拟合)与您在聚类后提供的形状(多了一列聚类成员)之间存在不匹配。您可以将聚类标签直接分配给原始数据,或者如果您确实需要进行逆向操作,那么您可以先 inverse_transform 再次缩放数据,然后向其添加聚类标签。两者都产生相同的数据帧。

# Import the packages
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans

# Load the data
data = pd.DataFrame(load_iris()['data'])

# Initialize a scaler
scaler = MinMaxScaler()

# Perform scaling
data_scaled = pd.DataFrame(scaler.fit_transform(data))

# Initialize KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=42)

# Obtain the clusters
clusters = kmeans.fit_predict(data_scaled)

# Add the cluster labels to the original data
data['clusters'] = clusters

# Inverse the scaling and add the cluster labels as a new column
data_invscaled = pd.DataFrame(scaler.inverse_transform(data_scaled.iloc[:, 0:4]))
data_invscaled['clusters'] = clusters

# Check whether the two dfs are equal --> None means that the two dfs are equal
print(pd.testing.assert_frame_equal(data, data_invscaled, check_dtype=False))