将浮点数据聚类到 python 中合适的桶中

Question

我有一个 csv 文件，其中包含按升序排列的数千个浮点值。我想 bunch/cluster 将这些值放入合适的集群中。

for example :
0.001
0.002
0.013
0.1
0.101
0.12
0.123
0.112
0.113
0.2

所以集群应该像

0 - 0.1 with count 4
0.1 - 0.2 with count 6

如何在 python 中自动执行此聚类任务？我需要保留一些初始参数吗？我对此一无所知。请帮忙。

Answer 1

您可以 bisect.bisect_left 以正确的增量找到元素在键列表中的位置，然后只需使用该索引从键列表中获取元素并使用字典增加其计数.

from bisect import bisect_left
with open("test.txt") as f:
    keys = [0.1, 0.2]
    d = dict.fromkeys(keys, 0)
    for line in f:
        ind = bisect_left(keys, float(line))
        d[keys[ind]] += 1
print(d)
{0.1: 4, 0.2: 6}

另一种方法是四舍五入：

with open("test.txt") as f:
    keys = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
    d = dict.fromkeys(keys, 0)
    for flt in map(float, f):
        k = round(flt + .05, 1) if flt > .05 else .1
        if flt not in d:
            d[k] += 1
        else:
            d[flt] += 1

将浮点数据聚类到 python 中合适的桶中

to clustter the floating point data into suitable buckets in python

python

cluster-analysis