有没有一种有效的方法来计算细胞中的点数？

Question

我有像这样的点集图：-

每个图表上最多有 100 万个点。您可以看到这些点散布在单元格网格中，每个单元格的大小为 200 x 100 个单位。所以显示了 35 个单元格。

有没有一种有效的方法来计算每个单元格中有多少个点？蛮力方法似乎是将数据解析 35 次，整个负载组合小于或大于语句。

Answer 1

以下某些步骤可以进行优化，因为您可以在构建数据集时执行其中一些步骤。但是我假设你只是得到了一系列的点，你必须找到它们适合的单元格。如果您可以将自己的代码注入到构建图表的步骤中，那么您可以在构建图表的同时执行我在下面写的内容，而不是在事后执行。

在只给出数据的情况下，你会遇到蛮力，否则你无法知道，因为你必须至少访问每个点一次才能弄清楚它在哪个单元格中。因此我们被 O(n) 困住了。如果您有一些其他可以利用的知识，那将取决于您是否可以利用 - 但由于在 OP 中没有提到它，我假设我们陷入了蛮力。

高级策略如下：

// 1) Set rectangle bounds to have minX/Y at +inf, and maxX/Y to be -inf
// or initialize it with the first point

// 2) For each point:
//       Set the set the min with min(point.x, bounds.min.x)
//       Same for the max as well

// 3) Now you have your bounds, you divide it by how many cells fit onto each
// axis while taking into account that you might need to round up with division
// truncating the results, unless you cast to float and ceil()
int cols = ceil(float(bounds.max.x - bounds.min.x) / CELL_WIDTH);
int rows = ceil(float(bounds.max.y - bounds.min.y) / CELL_HEIGHT);

// 4) You have the # of cells for the width and height, so make a 2D array of
// some sort that is w * h cells (each cell contains 32-bit int at least) and
// initialize to zero if this is C or C++

// 5) Figure out the cell number by subtracting the bottom left corner of our
// bounds (which should be the min point on the x/y axis that we found from (1))
for (Point p in points):
    int col = (p.x - minX) / cellWidth;
    int row = (p.y - minY) / cellHeight;
    data[row][col]++;

优化：

我们可以通过一些方法来加快速度：

如果单元格 width/height 是 2 的幂，则可以进行一些位移。如果它是 10 的倍数，this might possibly speed things up if you aren't using C or C++，但我还没有对此进行概要分析，所以也许 Java 中的热点和类似的东西无论如何都会为你做这个（并且不知道 Python）。那么100万点应该还是挺快的。
我们不需要一开始就遍历整个范围，我们可以继续调整 table 的大小，如果发现更大的值，则添加新的行和列。这样我们只对所有点进行一次迭代而不是两次。
如果你不关心额外的 space 用法并且你的数字只是正数，你可以通过假设一切都已经是相对的来避免 "translate to origin" 减法步骤到原点，根本不减去。您可以通过修改代码的步骤 (1) 使 min 从 0 开始而不是 inf（或者如果您选择第一个点）开始来解决这个问题。但是，如果您的点在轴上真的很远并且您最终会创建大量空槽，这可能会很糟糕。你会知道你的数据以及这是否可能。

可能还有一些事情可以做，但这会让你走上正确的轨道，提高效率。您也可以返回到它所在的单元格。

编辑：这假设与网格大小相比你不会有一些非常小的单元格宽度（比如你的宽度是 100 个单位，但你的图表可能跨越 200 万单位）。如果是这样，那么您需要研究可能的稀疏矩阵。

有没有一种有效的方法来计算细胞中的点数？

Is there an efficient way to count dots in cells?

graph

counting