skfuzzy C 中维度错误的聚类中心意味着聚类
Cluster centres with wrong dimentions in skfuzzy C mean clustering
您好,我在下面编写了简单的代码来探索 Fuzzy Cmean 聚类
import pandas as pd
import numpy as np
from os import listdir
from sklearn.model_selection import train_test_split
from skfuzzy.cluster import cmeans, cmeans_predict
from sklearn.metrics import classification_report,confusion_matrix
def find_csv_filenames( path_to_dir, suffix=".csv" ):
filenames = listdir(path_to_dir)
return [ path_to_dir+filename for filename in filenames if filename.endswith( suffix ) ]
listFiles = find_csv_filenames('<Path to folder with csv files>')
for files in listFiles:
df = pd.read_csv(files)
df.loc[df['bug']>1,'bug']=1
df2 =df.iloc[:,3:]
#Above are some pre processing steps
#Below splitting data for test and train
X_train, X_test = train_test_split(df2, test_size=0.30)
#dropping bug column for unsupervised learning
X_train2 = X_train.drop('bug',axis=1)
X_test2 = X_test.drop('bug',axis=1)
print (X_train2.shape)
#Shape is 163,20 for 163 training data with 20 features
cntr, u, u0, d, jm, p, fpc = cmeans(X_train2,2,2,0.25,500,init=None, seed=None)
print(cntr.shape)
#above shape is coming 2,163
来自上述 cmeam 算法的中心的大小为 (2,163) 但由于我的训练数据只有 20 个特征,因此 cntr 应该是 (2,20)。无法理解我哪里错了
来自 skfuzzy
文档:
data : 2d array, size (S, N)
Data to be clustered. N is the number of data sets; S is the number of features within each sample vector.
因此您需要转置您的输入,未经测试但是:
cmeans(X_train2.T, ...)
应该可以。
您好,我在下面编写了简单的代码来探索 Fuzzy Cmean 聚类
import pandas as pd
import numpy as np
from os import listdir
from sklearn.model_selection import train_test_split
from skfuzzy.cluster import cmeans, cmeans_predict
from sklearn.metrics import classification_report,confusion_matrix
def find_csv_filenames( path_to_dir, suffix=".csv" ):
filenames = listdir(path_to_dir)
return [ path_to_dir+filename for filename in filenames if filename.endswith( suffix ) ]
listFiles = find_csv_filenames('<Path to folder with csv files>')
for files in listFiles:
df = pd.read_csv(files)
df.loc[df['bug']>1,'bug']=1
df2 =df.iloc[:,3:]
#Above are some pre processing steps
#Below splitting data for test and train
X_train, X_test = train_test_split(df2, test_size=0.30)
#dropping bug column for unsupervised learning
X_train2 = X_train.drop('bug',axis=1)
X_test2 = X_test.drop('bug',axis=1)
print (X_train2.shape)
#Shape is 163,20 for 163 training data with 20 features
cntr, u, u0, d, jm, p, fpc = cmeans(X_train2,2,2,0.25,500,init=None, seed=None)
print(cntr.shape)
#above shape is coming 2,163
来自上述 cmeam 算法的中心的大小为 (2,163) 但由于我的训练数据只有 20 个特征,因此 cntr 应该是 (2,20)。无法理解我哪里错了
来自 skfuzzy
文档:
data : 2d array, size (S, N)
Data to be clustered. N is the number of data sets; S is the number of features within each sample vector.
因此您需要转置您的输入,未经测试但是:
cmeans(X_train2.T, ...)
应该可以。