Scikit 学习:在多维数据集上应用 Mean Shift
Scikit learn: Applying Mean Shift on a multi-dimensional dataset
我有一个数据集,它有 14 个不同的 features/columns 和 4328 行,然后我处理这些值并将其转换为形状为 (4328, 14) 的 NumPy 数组。然后我在这个 NumPy 数组上应用了 Mean Shift 来训练我的模型,它将数据点分成 29 个不同的集群。
集群中心:
array([[ 0.00000000e+00, 2.88896062e+02, 2.78953471e+02,
2.08648004e+02, 2.12223611e+02, 5.38985939e+01,
3.71283150e-01, 5.70311771e+03, 4.54253094e-01,
1.30592925e+00, 6.64259488e+00, 3.82481843e+00,
6.43865296e+00, 6.43865296e+00],
[ 0.00000000e+00, 2.83183908e+02, 9.48864664e+01,
3.59258621e+03, 9.05744253e+01, 8.35206117e+00,
4.13793103e-01, 5.70172414e+03, 2.78249425e-01,
8.88868966e-01, 6.63727816e+00, 4.84751149e+00,
6.61705172e+00, 6.61705172e+00],
[ 0.00000000e+00, 3.15511628e+02, 7.55761355e+01,
6.52134884e+03, 7.04900000e+01, 6.69296631e+00,
3.72093023e-01, 5.69984767e+03, 3.52367442e-01,
9.50423256e-01, 6.81103721e+00, 2.70016977e+00,
3.48411628e+00, 3.48411628e+00],
[ 0.00000000e+00, 2.98297297e+02, 4.95190674e+01,
9.43194595e+03, 4.64532432e+01, 4.89748830e+00,
3.24324324e-01, 5.69470405e+03, 1.71972973e-01,
1.21458649e+00, 6.85496486e+00, 3.54600000e+00,
5.62750811e+00, 5.62750811e+00],
[ 0.00000000e+00, 3.60428571e+02, 3.22145995e+03,
9.85714286e+00, 3.24273036e+03, -6.35189676e-01,
4.64285714e-01, 5.65968214e+03, -2.39050000e-01,
7.49132143e-01, 6.57582857e+00, -2.07893214e+00,
-6.82446429e-01, -6.82446429e-01],
[ 0.00000000e+00, 2.48600000e+02, 4.35963021e+01,
1.18772000e+04, 4.21820000e+01, 3.25541197e+00,
4.00000000e-01, 5.69281500e+03, -4.94350000e-01,
-1.41250000e-01, 7.01363000e+00, -7.76800000e-02,
2.37982000e+00, 2.37982000e+00],
[ 0.00000000e+00, 2.56777778e+02, 3.86608797e+01,
1.48944444e+04, 3.43100000e+01, 1.36524043e+01,
2.22222222e-01, 5.70588333e+03, -4.92000000e-02,
8.88366667e-01, 6.78814444e+00, 5.58971111e+00,
6.56455556e+00, 6.56455556e+00],
[ 0.00000000e+00, 3.14111111e+02, 4.78123643e+01,
2.02325556e+04, 4.67500000e+01, 4.74006148e+00,
5.55555556e-01, 5.70420556e+03, -2.40100000e-01,
8.96300000e-01, 7.09418889e+00, 6.68292222e+00,
1.12132667e+01, 1.12132667e+01],
[ 0.00000000e+00, 3.47200000e+02, 3.63744453e+01,
5.02000000e+04, 3.45700000e+01, 4.97221480e+00,
8.00000000e-01, 5.67206000e+03, -9.79280000e-01,
-1.08820000e-01, 7.67404000e+00, 1.17406000e+00,
1.44780600e+01, 1.44780600e+01],
[ 0.00000000e+00, 5.46000000e+02, 1.04748000e+04,
5.66666667e+00, 1.02684667e+04, 2.01687216e+00,
3.33333333e-01, 5.72818333e+03, 5.43600000e-01,
1.35213333e+00, 5.60560000e+00, 3.07716667e+00,
2.22003333e+00, 2.22003333e+00],
[ 0.00000000e+00, 2.09000000e+02, 2.39866667e+02,
1.17000000e+02, 2.33150000e+02, 1.67530023e+00,
1.00000000e+00, 9.13930000e+03, -1.69290000e+00,
-7.47800000e-01, 2.30790000e+00, 7.06666667e-01,
1.86860000e+00, 1.86860000e+00],
[ 0.00000000e+00, 2.01666667e+02, 6.86686111e+01,
2.57380000e+04, 6.56333333e+01, 5.85024181e+00,
3.33333333e-01, 5.75526667e+03, 1.19680000e+00,
2.18410000e+00, 6.13906667e+00, 1.75683667e+01,
1.90339000e+01, 1.90339000e+01],
[ 0.00000000e+00, 5.08000000e+02, 4.60818500e+04,
4.00000000e+00, 4.42663500e+03, 9.41967667e+02,
5.00000000e-01, 5.73742500e+03, -2.17150000e-01,
1.11570000e+00, 6.81375000e+00, 2.84170000e+00,
1.07105000e+00, 1.07105000e+00],
[ 0.00000000e+00, 5.15000000e+02, 1.23800000e+03,
2.00000000e+00, 3.66200000e+01, 3.28066630e+03,
0.00000000e+00, 5.70330000e+03, 2.96260000e+00,
2.53060000e+00, 6.56880000e+00, 2.56620000e+00,
5.00280000e+00, 5.00280000e+00],
[ 0.00000000e+00, 1.53000000e+02, 2.67980246e+01,
2.50000000e+05, 2.46500000e+01, 8.71409574e+00,
1.00000000e+00, 5.70805000e+03, -9.63100000e-01,
4.70000000e-01, 6.79200000e+00, -5.11360000e+00,
8.20730000e+00, 8.20730000e+00],
[ 0.00000000e+00, 5.74000000e+02, 2.67405322e+01,
4.10020000e+04, 2.49200000e+01, 7.30550630e+00,
1.00000000e+00, 5.73125000e+03, 2.08130000e+00,
3.34910000e+00, 6.92330000e+00, 5.08680000e+00,
8.58970000e+00, 8.58970000e+00],
[ 0.00000000e+00, 5.22000000e+02, 1.00364364e+02,
3.75630000e+04, 4.90300000e+01, 1.04699906e+02,
1.00000000e+00, 5.71880000e+03, 7.04600000e-01,
2.16130000e+00, 5.72310000e+00, -3.00900000e-01,
1.32520000e+00, 1.32520000e+00],
[ 0.00000000e+00, 3.46000000e+02, 2.24756530e+02,
1.27403000e+05, 2.22800000e+02, 8.78155326e-01,
1.00000000e+00, 5.70805000e+03, -9.63100000e-01,
4.70000000e-01, 6.79200000e+00, 2.50200000e-01,
5.96300000e+00, 5.96300000e+00],
[ 0.00000000e+00, 3.09000000e+02, 4.50972829e+01,
3.50000000e+04, 4.33000000e+01, 4.15076872e+00,
0.00000000e+00, 5.67600000e+03, 9.75300000e-01,
6.17300000e-01, 6.62310000e+00, 4.01550000e+01,
4.19152000e+01, 4.19152000e+01],
[ 0.00000000e+00, 3.46000000e+02, 2.26916384e+02,
1.00000000e+05, 2.24950000e+02, 8.74142476e-01,
1.00000000e+00, 5.65215000e+03, -1.88000000e-01,
7.87500000e-01, 7.94750000e+00, -3.13200000e-01,
6.47550000e+00, 6.47550000e+00],
[ 0.00000000e+00, 3.46000000e+02, 2.20191000e+02,
2.75000000e+05, 2.31950000e+02, -5.06962715e+00,
1.00000000e+00, 5.70460000e+03, -8.96800000e-01,
-3.83300000e-01, 5.95260000e+00, 5.14140000e+00,
7.58010000e+00, 7.58010000e+00],
[ 0.00000000e+00, 2.18000000e+02, 1.69836215e+02,
6.00000000e+04, 1.73550000e+02, -2.13989340e+00,
1.00000000e+00, 5.74695000e+03, 2.21600000e-01,
-2.66200000e-01, 5.37060000e+00, 4.42260000e+00,
1.03538000e+01, 1.03538000e+01],
[ 0.00000000e+00, 9.10000000e+01, 5.03828125e+01,
3.20000000e+04, 4.85000000e+01, 3.88208763e+00,
0.00000000e+00, 5.71880000e+03, 7.04600000e-01,
2.16130000e+00, 5.72310000e+00, 7.97870000e+00,
1.43018000e+01, 1.43018000e+01],
[ 0.00000000e+00, 1.82000000e+02, 3.66395435e+01,
5.40000000e+04, 3.63500000e+01, 7.96543380e-01,
1.00000000e+00, 5.67605000e+03, -1.73390000e+00,
-2.81400000e-01, 8.15350000e+00, -2.00800000e+00,
1.52570000e+00, 1.52570000e+00],
[ 0.00000000e+00, 3.43000000e+02, 2.31617647e+01,
1.70000000e+04, 2.16500000e+01, 6.98274691e+00,
0.00000000e+00, 5.67600000e+03, 9.75300000e-01,
6.17300000e-01, 6.62310000e+00, 2.45333000e+01,
2.12987000e+01, 2.12987000e+01],
[ 0.00000000e+00, 2.18000000e+02, 1.63871636e+02,
1.19500000e+05, 1.61950000e+02, 1.18656127e+00,
1.00000000e+00, 5.64800000e+03, -2.77500000e-01,
-1.23880000e+00, 7.32370000e+00, -6.76500000e-01,
-7.47950000e+00, -7.47950000e+00],
[ 0.00000000e+00, 3.46000000e+02, 2.24871313e+02,
7.25970000e+04, 2.22800000e+02, 9.29673637e-01,
1.00000000e+00, 5.70805000e+03, -9.63100000e-01,
4.70000000e-01, 6.79200000e+00, 2.50200000e-01,
5.96300000e+00, 5.96300000e+00],
[ 0.00000000e+00, 5.70000000e+01, 1.02000000e+01,
2.35008000e+05, 1.05000000e+01, -2.85714286e+00,
1.00000000e+00, 5.70460000e+03, -8.96800000e-01,
-3.83300000e-01, 5.95260000e+00, -3.77360000e+00,
2.51260000e+00, 2.51260000e+00],
[ 0.00000000e+00, 2.10000000e+01, 1.19055525e+01,
4.15000000e+05, 1.14000000e+01, 4.43467132e+00,
1.00000000e+00, 5.67605000e+03, -1.73390000e+00,
-2.81400000e-01, 8.15350000e+00, -1.69065000e+01,
-2.84830000e+01, -2.84830000e+01]]))
现在,我尝试在 2D 平面中绘制这些簇,然后生成了这个图:
现在,我不太确定为什么我的聚类和各种数据点被绘制在一条线上,每个坐标的 X 轴值为 0。我在这里遗漏了什么吗?如果我想将它们聚类到不同的聚类中,我应该以不同的方式预处理我的数据集吗?
编辑 1:
用于绘制上图的代码(clf
是我的模型对象的名称):
labels = clf.labels_
cluster_centers = clf.cluster_centers_
n_clusters_ = len(np.unique(labels))
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
my_members = labels == k
cluster_center = cluster_centers[k]
plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=14)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
由于您的数据具有 14 特征,MeanShift 将尝试识别 "blobs"/14 维 中的簇 space 在你的 4328 个数据点中找到了 29 个中心。因此,您的输出簇描述了 14 维 space 中的 29 个点 - 因此是 29x14 形状 - 这很难在二维图中可视化。
当您绘图时,您目前仅使用集群输出的前两个维度 (plot(X[my_members, 0], X[my_members, 1], ...
),并且由于第一个维度似乎全为零,因此绘图点最终成为一条线。
如果您只对聚类结果感兴趣,您已经在 clf.labels_
输出中得到结果,它应该是一个 4328x1 向量。
为了可视化更高维度的点,您可以尝试将聚类数据拆分为几个子图(可能是 7 个二维图)或尝试以某种方式减少维度(您可以从删除第一列开始,因为所有值相同 - 零)
另一种在 2D(或 3D 绘图)中可视化更高维度的方法是 t-SNE, perhaps you should check that out. It is also available as in scikit-learn and a quick intro in this Google Talk
我有一个数据集,它有 14 个不同的 features/columns 和 4328 行,然后我处理这些值并将其转换为形状为 (4328, 14) 的 NumPy 数组。然后我在这个 NumPy 数组上应用了 Mean Shift 来训练我的模型,它将数据点分成 29 个不同的集群。
集群中心:
array([[ 0.00000000e+00, 2.88896062e+02, 2.78953471e+02,
2.08648004e+02, 2.12223611e+02, 5.38985939e+01,
3.71283150e-01, 5.70311771e+03, 4.54253094e-01,
1.30592925e+00, 6.64259488e+00, 3.82481843e+00,
6.43865296e+00, 6.43865296e+00],
[ 0.00000000e+00, 2.83183908e+02, 9.48864664e+01,
3.59258621e+03, 9.05744253e+01, 8.35206117e+00,
4.13793103e-01, 5.70172414e+03, 2.78249425e-01,
8.88868966e-01, 6.63727816e+00, 4.84751149e+00,
6.61705172e+00, 6.61705172e+00],
[ 0.00000000e+00, 3.15511628e+02, 7.55761355e+01,
6.52134884e+03, 7.04900000e+01, 6.69296631e+00,
3.72093023e-01, 5.69984767e+03, 3.52367442e-01,
9.50423256e-01, 6.81103721e+00, 2.70016977e+00,
3.48411628e+00, 3.48411628e+00],
[ 0.00000000e+00, 2.98297297e+02, 4.95190674e+01,
9.43194595e+03, 4.64532432e+01, 4.89748830e+00,
3.24324324e-01, 5.69470405e+03, 1.71972973e-01,
1.21458649e+00, 6.85496486e+00, 3.54600000e+00,
5.62750811e+00, 5.62750811e+00],
[ 0.00000000e+00, 3.60428571e+02, 3.22145995e+03,
9.85714286e+00, 3.24273036e+03, -6.35189676e-01,
4.64285714e-01, 5.65968214e+03, -2.39050000e-01,
7.49132143e-01, 6.57582857e+00, -2.07893214e+00,
-6.82446429e-01, -6.82446429e-01],
[ 0.00000000e+00, 2.48600000e+02, 4.35963021e+01,
1.18772000e+04, 4.21820000e+01, 3.25541197e+00,
4.00000000e-01, 5.69281500e+03, -4.94350000e-01,
-1.41250000e-01, 7.01363000e+00, -7.76800000e-02,
2.37982000e+00, 2.37982000e+00],
[ 0.00000000e+00, 2.56777778e+02, 3.86608797e+01,
1.48944444e+04, 3.43100000e+01, 1.36524043e+01,
2.22222222e-01, 5.70588333e+03, -4.92000000e-02,
8.88366667e-01, 6.78814444e+00, 5.58971111e+00,
6.56455556e+00, 6.56455556e+00],
[ 0.00000000e+00, 3.14111111e+02, 4.78123643e+01,
2.02325556e+04, 4.67500000e+01, 4.74006148e+00,
5.55555556e-01, 5.70420556e+03, -2.40100000e-01,
8.96300000e-01, 7.09418889e+00, 6.68292222e+00,
1.12132667e+01, 1.12132667e+01],
[ 0.00000000e+00, 3.47200000e+02, 3.63744453e+01,
5.02000000e+04, 3.45700000e+01, 4.97221480e+00,
8.00000000e-01, 5.67206000e+03, -9.79280000e-01,
-1.08820000e-01, 7.67404000e+00, 1.17406000e+00,
1.44780600e+01, 1.44780600e+01],
[ 0.00000000e+00, 5.46000000e+02, 1.04748000e+04,
5.66666667e+00, 1.02684667e+04, 2.01687216e+00,
3.33333333e-01, 5.72818333e+03, 5.43600000e-01,
1.35213333e+00, 5.60560000e+00, 3.07716667e+00,
2.22003333e+00, 2.22003333e+00],
[ 0.00000000e+00, 2.09000000e+02, 2.39866667e+02,
1.17000000e+02, 2.33150000e+02, 1.67530023e+00,
1.00000000e+00, 9.13930000e+03, -1.69290000e+00,
-7.47800000e-01, 2.30790000e+00, 7.06666667e-01,
1.86860000e+00, 1.86860000e+00],
[ 0.00000000e+00, 2.01666667e+02, 6.86686111e+01,
2.57380000e+04, 6.56333333e+01, 5.85024181e+00,
3.33333333e-01, 5.75526667e+03, 1.19680000e+00,
2.18410000e+00, 6.13906667e+00, 1.75683667e+01,
1.90339000e+01, 1.90339000e+01],
[ 0.00000000e+00, 5.08000000e+02, 4.60818500e+04,
4.00000000e+00, 4.42663500e+03, 9.41967667e+02,
5.00000000e-01, 5.73742500e+03, -2.17150000e-01,
1.11570000e+00, 6.81375000e+00, 2.84170000e+00,
1.07105000e+00, 1.07105000e+00],
[ 0.00000000e+00, 5.15000000e+02, 1.23800000e+03,
2.00000000e+00, 3.66200000e+01, 3.28066630e+03,
0.00000000e+00, 5.70330000e+03, 2.96260000e+00,
2.53060000e+00, 6.56880000e+00, 2.56620000e+00,
5.00280000e+00, 5.00280000e+00],
[ 0.00000000e+00, 1.53000000e+02, 2.67980246e+01,
2.50000000e+05, 2.46500000e+01, 8.71409574e+00,
1.00000000e+00, 5.70805000e+03, -9.63100000e-01,
4.70000000e-01, 6.79200000e+00, -5.11360000e+00,
8.20730000e+00, 8.20730000e+00],
[ 0.00000000e+00, 5.74000000e+02, 2.67405322e+01,
4.10020000e+04, 2.49200000e+01, 7.30550630e+00,
1.00000000e+00, 5.73125000e+03, 2.08130000e+00,
3.34910000e+00, 6.92330000e+00, 5.08680000e+00,
8.58970000e+00, 8.58970000e+00],
[ 0.00000000e+00, 5.22000000e+02, 1.00364364e+02,
3.75630000e+04, 4.90300000e+01, 1.04699906e+02,
1.00000000e+00, 5.71880000e+03, 7.04600000e-01,
2.16130000e+00, 5.72310000e+00, -3.00900000e-01,
1.32520000e+00, 1.32520000e+00],
[ 0.00000000e+00, 3.46000000e+02, 2.24756530e+02,
1.27403000e+05, 2.22800000e+02, 8.78155326e-01,
1.00000000e+00, 5.70805000e+03, -9.63100000e-01,
4.70000000e-01, 6.79200000e+00, 2.50200000e-01,
5.96300000e+00, 5.96300000e+00],
[ 0.00000000e+00, 3.09000000e+02, 4.50972829e+01,
3.50000000e+04, 4.33000000e+01, 4.15076872e+00,
0.00000000e+00, 5.67600000e+03, 9.75300000e-01,
6.17300000e-01, 6.62310000e+00, 4.01550000e+01,
4.19152000e+01, 4.19152000e+01],
[ 0.00000000e+00, 3.46000000e+02, 2.26916384e+02,
1.00000000e+05, 2.24950000e+02, 8.74142476e-01,
1.00000000e+00, 5.65215000e+03, -1.88000000e-01,
7.87500000e-01, 7.94750000e+00, -3.13200000e-01,
6.47550000e+00, 6.47550000e+00],
[ 0.00000000e+00, 3.46000000e+02, 2.20191000e+02,
2.75000000e+05, 2.31950000e+02, -5.06962715e+00,
1.00000000e+00, 5.70460000e+03, -8.96800000e-01,
-3.83300000e-01, 5.95260000e+00, 5.14140000e+00,
7.58010000e+00, 7.58010000e+00],
[ 0.00000000e+00, 2.18000000e+02, 1.69836215e+02,
6.00000000e+04, 1.73550000e+02, -2.13989340e+00,
1.00000000e+00, 5.74695000e+03, 2.21600000e-01,
-2.66200000e-01, 5.37060000e+00, 4.42260000e+00,
1.03538000e+01, 1.03538000e+01],
[ 0.00000000e+00, 9.10000000e+01, 5.03828125e+01,
3.20000000e+04, 4.85000000e+01, 3.88208763e+00,
0.00000000e+00, 5.71880000e+03, 7.04600000e-01,
2.16130000e+00, 5.72310000e+00, 7.97870000e+00,
1.43018000e+01, 1.43018000e+01],
[ 0.00000000e+00, 1.82000000e+02, 3.66395435e+01,
5.40000000e+04, 3.63500000e+01, 7.96543380e-01,
1.00000000e+00, 5.67605000e+03, -1.73390000e+00,
-2.81400000e-01, 8.15350000e+00, -2.00800000e+00,
1.52570000e+00, 1.52570000e+00],
[ 0.00000000e+00, 3.43000000e+02, 2.31617647e+01,
1.70000000e+04, 2.16500000e+01, 6.98274691e+00,
0.00000000e+00, 5.67600000e+03, 9.75300000e-01,
6.17300000e-01, 6.62310000e+00, 2.45333000e+01,
2.12987000e+01, 2.12987000e+01],
[ 0.00000000e+00, 2.18000000e+02, 1.63871636e+02,
1.19500000e+05, 1.61950000e+02, 1.18656127e+00,
1.00000000e+00, 5.64800000e+03, -2.77500000e-01,
-1.23880000e+00, 7.32370000e+00, -6.76500000e-01,
-7.47950000e+00, -7.47950000e+00],
[ 0.00000000e+00, 3.46000000e+02, 2.24871313e+02,
7.25970000e+04, 2.22800000e+02, 9.29673637e-01,
1.00000000e+00, 5.70805000e+03, -9.63100000e-01,
4.70000000e-01, 6.79200000e+00, 2.50200000e-01,
5.96300000e+00, 5.96300000e+00],
[ 0.00000000e+00, 5.70000000e+01, 1.02000000e+01,
2.35008000e+05, 1.05000000e+01, -2.85714286e+00,
1.00000000e+00, 5.70460000e+03, -8.96800000e-01,
-3.83300000e-01, 5.95260000e+00, -3.77360000e+00,
2.51260000e+00, 2.51260000e+00],
[ 0.00000000e+00, 2.10000000e+01, 1.19055525e+01,
4.15000000e+05, 1.14000000e+01, 4.43467132e+00,
1.00000000e+00, 5.67605000e+03, -1.73390000e+00,
-2.81400000e-01, 8.15350000e+00, -1.69065000e+01,
-2.84830000e+01, -2.84830000e+01]]))
现在,我尝试在 2D 平面中绘制这些簇,然后生成了这个图:
现在,我不太确定为什么我的聚类和各种数据点被绘制在一条线上,每个坐标的 X 轴值为 0。我在这里遗漏了什么吗?如果我想将它们聚类到不同的聚类中,我应该以不同的方式预处理我的数据集吗?
编辑 1:
用于绘制上图的代码(clf
是我的模型对象的名称):
labels = clf.labels_
cluster_centers = clf.cluster_centers_
n_clusters_ = len(np.unique(labels))
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
my_members = labels == k
cluster_center = cluster_centers[k]
plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=14)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
由于您的数据具有 14 特征,MeanShift 将尝试识别 "blobs"/14 维 中的簇 space 在你的 4328 个数据点中找到了 29 个中心。因此,您的输出簇描述了 14 维 space 中的 29 个点 - 因此是 29x14 形状 - 这很难在二维图中可视化。
当您绘图时,您目前仅使用集群输出的前两个维度 (plot(X[my_members, 0], X[my_members, 1], ...
),并且由于第一个维度似乎全为零,因此绘图点最终成为一条线。
如果您只对聚类结果感兴趣,您已经在 clf.labels_
输出中得到结果,它应该是一个 4328x1 向量。
为了可视化更高维度的点,您可以尝试将聚类数据拆分为几个子图(可能是 7 个二维图)或尝试以某种方式减少维度(您可以从删除第一列开始,因为所有值相同 - 零)
另一种在 2D(或 3D 绘图)中可视化更高维度的方法是 t-SNE, perhaps you should check that out. It is also available as in scikit-learn and a quick intro in this Google Talk