是否可以在scikit learn中通过多维缩放找到相似的顺序?
Is it possible to find similar orders by multi-dimensional scaling in scikit learn?
我有几个包含 10 个点的 3D 位置的文件(如相应图片中的绘图)。我想使用多维缩放来找到相似的排序并打印出不同的排序。例如,这里从文本文件1、2、4排序完全相同,但与3不同。
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA
A1=[[0.000, 0.000, 0.5],
[0.250, 0.000, 0.5],
[0.125, 0.250, 0.5],
[0.375, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.125, 0.750, 0.5],
[0.375, 0.750, 0.5],
[0.000, 1.000, 0.5],
[0.250, 1.000, 0.5]]
A2=[[0.500, 0.000, 0.5],
[0.750, 0.000, 0.5],
[0.375, 0.250, 0.5],
[0.625, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.500, 1.000, 0.5],
[0.750, 1.000, 0.5]]
A3=[[0.500, 0.000, 0.5],
[0.750, 0.000, 0.5],
[0.625, 0.250, 0.5],
[0.875, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.500, 1.000, 0.5],
[0.750, 1.000, 0.5]]
A4=[[0.250, 0.000, 0.5],
[0.500, 0.000, 0.5],
[0.375, 0.250, 0.5],
[0.625, 0.250, 0.5],
[0.500, 0.500, 0.5],
[0.750, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.250, 1.000, 0.5],
[0.500, 1.000, 0.5]]
print(len(A1), len(A2), len(A3), len(A4))
a1=euclidean_distances(A1)
a2=euclidean_distances(A2)
a3=euclidean_distances(A3)
a4=euclidean_distances(A4)
print(a1)
输出
Number of different orders: 2
A1
A3
设置数据和库:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA
A1=[[0.000, 0.000, 0.5],
[0.250, 0.000, 0.5],
[0.125, 0.250, 0.5],
[0.375, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.125, 0.750, 0.5],
[0.375, 0.750, 0.5],
[0.000, 1.000, 0.5],
[0.250, 1.000, 0.5]]
A2=[[0.500, 0.000, 0.5],
[0.750, 0.000, 0.5],
[0.375, 0.250, 0.5],
[0.625, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.500, 1.000, 0.5],
[0.750, 1.000, 0.5]]
A3=[[0.500, 0.000, 0.5],
[0.750, 0.000, 0.5],
[0.625, 0.250, 0.5],
[0.875, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.500, 1.000, 0.5],
[0.750, 1.000, 0.5]]
A4=[[0.250, 0.000, 0.5],
[0.500, 0.000, 0.5],
[0.375, 0.250, 0.5],
[0.625, 0.250, 0.5],
[0.500, 0.500, 0.5],
[0.750, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.250, 1.000, 0.5],
[0.500, 1.000, 0.5]]
让我们以方便的方式放置数据并计算距离。
""""""
# Number of different elemnts
segments_dic = {'A1': A1,
'A2': A2,
'A3': A3,
'A4': A4,}
# To clasify the elements
segments_distances = []
for i in segments_dic.keys():
segments_distances.append(round(euclidean_distances(segments_dic[i]).sum(),3))
现在让我们检查哪些是更接近的点组:
"""Number of different elements / orders
I will round the results to make them comparable"""
different_elements = np.unique(segments_distances)
print("number of different orders: ",np.unique(segments_distances).__len__())
print("different orders: ", different_elements)
np.unique(segments_distances).__len__()
for i in different_elements:
print("For element distance ",i," corresponding groups are: ")
for j in segments_dic.keys():
if i == round(euclidean_distances(segments_dic[j]).sum(),3):
print(j)
输出结果如下:
number of different orders: 2
different orders: [46.952 48.496]
For element distance 46.952 corresponding groups are:
A1
A2
A4
For element distance 48.496 corresponding groups are:
A3
看看我们是否可以用图片验证这个结果。
二维绘图:
"""Plots"""
for i in segments_dic.keys():
# Rotate the data
clf = PCA(n_components=2)
X = clf.fit_transform(segments_dic[i])
aux = pd.DataFrame(X)
fig = plt.figure()
plt.scatter(aux.iloc[:,0],aux.iloc[:,1])
plt.title('{}'.format(i))
fig.savefig('{}_representation.svg'.format(i))
上传图片:
结果在图片上得到验证。
我有几个包含 10 个点的 3D 位置的文件(如相应图片中的绘图)。我想使用多维缩放来找到相似的排序并打印出不同的排序。例如,这里从文本文件1、2、4排序完全相同,但与3不同。
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA
A1=[[0.000, 0.000, 0.5],
[0.250, 0.000, 0.5],
[0.125, 0.250, 0.5],
[0.375, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.125, 0.750, 0.5],
[0.375, 0.750, 0.5],
[0.000, 1.000, 0.5],
[0.250, 1.000, 0.5]]
A2=[[0.500, 0.000, 0.5],
[0.750, 0.000, 0.5],
[0.375, 0.250, 0.5],
[0.625, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.500, 1.000, 0.5],
[0.750, 1.000, 0.5]]
A3=[[0.500, 0.000, 0.5],
[0.750, 0.000, 0.5],
[0.625, 0.250, 0.5],
[0.875, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.500, 1.000, 0.5],
[0.750, 1.000, 0.5]]
A4=[[0.250, 0.000, 0.5],
[0.500, 0.000, 0.5],
[0.375, 0.250, 0.5],
[0.625, 0.250, 0.5],
[0.500, 0.500, 0.5],
[0.750, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.250, 1.000, 0.5],
[0.500, 1.000, 0.5]]
print(len(A1), len(A2), len(A3), len(A4))
a1=euclidean_distances(A1)
a2=euclidean_distances(A2)
a3=euclidean_distances(A3)
a4=euclidean_distances(A4)
print(a1)
输出
Number of different orders: 2
A1
A3
设置数据和库:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA
A1=[[0.000, 0.000, 0.5],
[0.250, 0.000, 0.5],
[0.125, 0.250, 0.5],
[0.375, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.125, 0.750, 0.5],
[0.375, 0.750, 0.5],
[0.000, 1.000, 0.5],
[0.250, 1.000, 0.5]]
A2=[[0.500, 0.000, 0.5],
[0.750, 0.000, 0.5],
[0.375, 0.250, 0.5],
[0.625, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.500, 1.000, 0.5],
[0.750, 1.000, 0.5]]
A3=[[0.500, 0.000, 0.5],
[0.750, 0.000, 0.5],
[0.625, 0.250, 0.5],
[0.875, 0.250, 0.5],
[0.250, 0.500, 0.5],
[0.500, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.500, 1.000, 0.5],
[0.750, 1.000, 0.5]]
A4=[[0.250, 0.000, 0.5],
[0.500, 0.000, 0.5],
[0.375, 0.250, 0.5],
[0.625, 0.250, 0.5],
[0.500, 0.500, 0.5],
[0.750, 0.500, 0.5],
[0.375, 0.750, 0.5],
[0.625, 0.750, 0.5],
[0.250, 1.000, 0.5],
[0.500, 1.000, 0.5]]
让我们以方便的方式放置数据并计算距离。
""""""
# Number of different elemnts
segments_dic = {'A1': A1,
'A2': A2,
'A3': A3,
'A4': A4,}
# To clasify the elements
segments_distances = []
for i in segments_dic.keys():
segments_distances.append(round(euclidean_distances(segments_dic[i]).sum(),3))
现在让我们检查哪些是更接近的点组:
"""Number of different elements / orders
I will round the results to make them comparable"""
different_elements = np.unique(segments_distances)
print("number of different orders: ",np.unique(segments_distances).__len__())
print("different orders: ", different_elements)
np.unique(segments_distances).__len__()
for i in different_elements:
print("For element distance ",i," corresponding groups are: ")
for j in segments_dic.keys():
if i == round(euclidean_distances(segments_dic[j]).sum(),3):
print(j)
输出结果如下:
number of different orders: 2
different orders: [46.952 48.496]
For element distance 46.952 corresponding groups are:
A1
A2
A4
For element distance 48.496 corresponding groups are:
A3
看看我们是否可以用图片验证这个结果。
二维绘图:
"""Plots"""
for i in segments_dic.keys():
# Rotate the data
clf = PCA(n_components=2)
X = clf.fit_transform(segments_dic[i])
aux = pd.DataFrame(X)
fig = plt.figure()
plt.scatter(aux.iloc[:,0],aux.iloc[:,1])
plt.title('{}'.format(i))
fig.savefig('{}_representation.svg'.format(i))
上传图片:
结果在图片上得到验证。