ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() python dbscan 3 dimensions point

Question

我想使用 DBSCAN 算法对包含 3 个点的数据集进行聚类。这是数据集：

我用这段代码做聚类：

from math import sqrt, pow

def __init__(eps=0.1, min_points=2):
    eps = 10
    min_points = 2
    visited = []
    noise = []
    clusters = []
    dp = []

def cluster(data_points):
    visited = []
    dp = data_points
    c = 0

    for point in data_points:
        if point not in visited:
            visited.append(point)
            print point
            neighbours = region_query(point)
            #print neighbours
            if len(neighbours) < min_points:
                noise.append(point)

            else:
                c += 1
                expand_cluster(c, neighbours)

#cluster(data_points)

def expand_cluster(cluster_number, p_neighbours):
    cluster = ("Cluster: %d" % cluster_number, [])
    clusters.append(cluster)
    new_points = p_neighbours
    while new_points:
        new_points = pool(cluster, new_points)


def region_query(p):
    result = []
    for d in dp:
        distance = (((d[0] - p[0])**2 + (d[1] - p[1])**2 + (d[2] - p[2])**2)**0.5)
        print distance
        if distance <= eps:
            result.append(d)
    return result

#p_neighbours = region_query(p=pcsv)

def pool(cluster, p_neighbours):
    new_neighbours = []
    for n in p_neighbours:
        if n not in visited:
            visited.append(n)
            n_neighbours = region_query(n)
            if len(n_neighbours) >= min_points:
                new_neighbours = unexplored(p_neighbours, n_neighbours)
        for c in clusters:
            if n not in c[1] and n not in cluster[1]:
                cluster[1].append(n)
    return new_neighbours

@staticmethod
def unexplored(x, y):
    z = []
    for p in y:
        if p not in x:
            z.append(p)
    return z

在此代码中有 point 和 n 变量，它们与包含数据集的 data_points 相同。如果我阅读手册，我想这段代码实际上可以工作，但是当我运行 cluster() 函数时出现错误。

Traceback (most recent call last):

  File "<ipython-input-39-77eb6be20d82>", line 2, in <module>
    if n not in visited:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我不知道为什么这段代码仍然会出错，而我用索引数据更改了 n 或 point 变量。你知道这段代码有什么问题吗？我怎样才能让它工作？

谢谢你的帮助..

Answer 1

这些行出现错误：

    if point not in visited:
        visited.append(point)

in 运算符调用 list.__contains__，它遍历 visited 列表中的项目以查看是否有任何项目等于 point。但是，numpy 数组之间的相等性测试不会产生单个布尔值，而是一个布尔数组，表示数组中项目的逐元素比较。例如，array([1, 2]) == array([1, 3]) 的结果是 array([True, False])，而不仅仅是 False。

到目前为止还可以。 Python 中的比较允许 return 他们想要的任何类型的对象。但是，当in判断是否相等时，最后需要一个布尔结果，所以在比较结果上调用bool。您收到的异常来自 bool(array([...]))，正如消息所说，它是不明确的。 bool(array([True, False])) 应该是 True 还是 False？图书馆拒绝为你猜测

不幸的是，我认为没有解决此问题的好方法。也许您可以先将点转换为元组，然后再将它们保存在 visited 中？作为一个很好的副作用，这会让你使用 set 而不是列表（因为元组是可散列的）。

您可能遇到的另一个问题是浮点数之间的相等性测试本质上容易出错。应该相等的两个数字，使用不同计算得出的浮点数进行比较时，实际上可能不相等。例如，0.1 + 0.2 == 0.3 是 False，因为等号两边的舍入方式不同。因此，即使您有两个应该相等的点，您也可能无法仅使用相等性测试在数据中检测到它们。您需要计算它们的差异并将其与一些小的 espilon 值进行比较，以估计您的计算可能产生的最大误差。

Answer 2

如果您使用 numpy，您应该使用掩码而不是列表：

def cluster(data_points, eps=0.1, min_points=3):
    cluster_numbers = numpy.zeros(len(data_points), dtype=int)
    c = 0
    for idx, point in enumerate(data_points):
        if cluster_numbers[idx] == 0:
            print point
            neighbours = region_query(data_points, point, eps)
            #print neighbours
            if sum(neighbours) < min_points:
                # noise
                cluster_numbers[idx] = -1
            else:
                c += 1
                expand_cluster(c, data_points, cluster_numbers, neighbours, eps)
    return cluster_numbers

def region_query(points, point, eps=0.1):
    distance = ((points-point)**2).sum(axis=1) ** 0.5
    return distance <= eps

def expand_cluster(cluster_number, points, cluster_numbers, new_points, eps=0.1):
    while True:
        indices = numpy.where(new_points & (cluster_numbers==0))[0]
        if not len(indices):
            break
        new_points = False
        for idx in indices:
            cluster_numbers[idx] = cluster_number
            new_points = new_points | region_query(points, points[idx], eps)

你得到的是一个整数数组，每个输入点一个。值为-1的位置是噪声点，1 .. n是不同的簇。

所以你可以获得集群的积分：

cluster_numbers = cluster(data_points)
noise_points = data_points[cluster_numbers == -1]
print "Total Clusters:", cluster_numbers.max()
for idx in range(1, cluster_numbers.max() + 1):
    cluster_points = data_points[cluster_numbers == idx]
    print "Cluster %d as %d points" % (idx, len(cluster_points))

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() python dbscan 3 dimensions point

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() python dbscan 3 dimensions point

python

cluster-analysis

ambiguous

dbscan