如何根据时间戳而不是位置来查找地理空间数据集群？

Question

我有一组地理空间数据，在单独的列中也有相应的时间戳。

像这样：

Timestamp	Latitude	Longitude
1	1.56	104.57
2	1.57	105.42
4	1.65	103.32
12	1.76	101.15
14	1.78	100.45
16	1.80	99.65

我希望能够根据时间戳而非距离对数据进行聚类。

所以对于上面的例子，我应该获得 2 个集群：1 个来自第一个 3 个数据点，1 个来自剩余的 3 个。我还想获得每个集群的时间戳范围是可能的。

到目前为止，根据我的研究，我只得到了地理空间距离聚类或时间序列聚类，这两者听起来都不是我需要的。对于我正在尝试做的事情，有什么推荐的算法吗？

Answer 1

此处具有噪声的应用程序的基于密度的空间聚类或不久DBSCAN算法将对你的情况有帮助。 DBSCAN 是一种基于密度的聚类算法，它根据点之间的接近程度对它们进行分组。

根据我在快速研究中的理解，DBSCAN 围绕其核心绘制了一个圆圈。圆的半径称为 epsilon。单个圆圈内的所有点都将计入同一簇。 epsilon 越大，您在 cluster 中的点数就越多，反之亦然。

您可以在 this & this 链接上找到此算法的更多内容。

为什么 DBSCAN 适合时间序列聚类：

DBSCAN不需要k（簇数）作为输入

在您的情况下，可能有许多集群时间段。尝试拟合 肘部曲线 以找到最佳数量的簇将是耗时&低效.

代码：

下面的代码片段将完成您的任务，

import pandas as pd
from sklearn.cluster import DBSCAN
import numpy as np
import matplotlib.pyplot as plt

# Getting Data
df = pd.DataFrame({
    'Timestamp' : [1,2,4,12,14,16,25,28,29],
    'Latitude' : [1.56,1.57,1.65,1.76,1.78,1.80,1.83,1.845,1.855],
    'Longitude' : [104.57,105.42,103.32,101.15,100.45,99.65,100,100.3,101.2]})

# Initializing the object
db = DBSCAN(eps=3.0, min_samples=3)

# eps = Epsilon value. Larger the epsilon, the more distant points you will catch in a single cluster.
#       Ex.  eps = 1.0 wasn't capturing the '4' value from [1,2,4] cluster. Increasing the epsilon 
#       helped in detecting that.

# min_samples = Minimum number of samples you want in your single cluster.

# Fitting the algorithm onto Timestamp column
df['Cluster'] = db.fit_predict(np.array(df['Timestamp']).reshape(-1,1))

print(f"Found {df['Cluster'].nunique()} clusters \n")
print(df)

# Plotting the Graph
fig = plt.figure(figsize = (5,5))
plt.xlabel('Latitude')
plt.ylabel('Longitude')

for data in df.groupby(df['Cluster']):
    index = data[0]
    df = data[1]
    plt.scatter(df['Latitude'], df['Longitude'], c=np.random.rand(1,len(df)),  label=f"Cluster {index}")
    
plt.legend()
plt.show()

输出:

如何根据时间戳而不是位置来查找地理空间数据集群？

How to find clusters of geospatial data based on their timestamp rather than position?

python

cluster-analysis

为什么 DBSCAN 适合时间序列聚类：

代码：

如何根据时间戳而不是位置来查找地理空间数据集群？

How to find clusters of geospatial data based on their timestamp rather than position?

python

cluster-analysis

为什么 DBSCAN 适合 时间序列聚类：

代码：

为什么 DBSCAN 适合时间序列聚类：